What if the biggest bottleneck in your programs isn’t the code — it’s the assumption that one thread rules them all?
Ryan Fleury’s manifesto, Multi-Core By Default, hits like a rogue wave in the calm sea of programming orthodoxy. Right out of the gate, it’s proposing a world where parallelism isn’t an afterthought, a bolted-on library you wrestle with at 2 a.m. No. Multi-core utilization baked in from the language level up. Fleury, the guy behind some sharp tools in the Zig ecosystem and handmade coding lore, argues we’ve got dozens of cores sitting idle because programmers are still trained on 1990s mental models.
Look. CPUs have been multi-core for 15 years now. Your laptop? 8, 12, 16 threads humming away. But most apps? They sip from one straw, leaving the rest parched. Fleury’s fix: redesign languages and runtimes so concurrency is the default path, not a detour.
How Does ‘Multi-Core By Default’ Actually Work?
Fleury breaks it down surgically. Start with the scheduler — not some OS black box, but a deliberate layer in your runtime that divvies work across cores automatically. Think data parallelism first: split arrays, loop bodies, anything iterable, and dispatch chunks without you spelling out threads or locks.
He draws from battle-tested ideas. SIMD vectorization? That’s single-core baby steps; now scale it to cores. But here’s the kicker — Fleury wants task graphs as the primitive. You declare dependencies, not execution order. The runtime figures the parallelism, steals cycles from idle cores, even migrates tasks mid-flight.
“The goal is to make single-threaded performance a byproduct of multi-core correctness, not the primary target.”
That’s Fleury, crystal clear. No more “it works on my machine” threading bugs. Correctness first, speed second — and speed comes free.
But wait. Isn’t this what everyone promises? Async/await in JS, Go routines, Erlang actors. Fleury says nah — those are opt-in comforts. He wants zero-cost defaults: write sequential-looking code, get parallel execution. Like how Python hid the GIL mess until it couldn’t.
A single sentence: Wild.
This isn’t pie-in-the-sky. Fleury prototypes it in Zig, where comptime magic already blurs compile-run lines. Compile-time analysis spots parallelizable regions — loops without side effects, pure functions — and emits core-spanning code. No macros, no metaprogramming hacks. Just sane defaults.
Dig deeper: memory matters. Shared heaps kill scalability; Fleury pushes per-core arenas, with coherent migration only when tasks hand off. Cache lines? Respected automatically. False sharing? Compiler warns or rewrites.
And the why? Architecture’s shifted. ARM’s big.LITTLE, Intel’s P/E cores — heterogeneity screams for smart schedulers. Write once, run everywhere, cores auto-assigned by priority.
Why Hasn’t Multi-Core Been Default Already?
Blame history. C’s fork-join model stuck because it was simple. POSIX threads? A nightmare API born from Unix sins. Languages piled on sugar — OpenMP pragmas, TBB flows — but never flipped the script.
Fleury calls it out: PR spin from chip makers. “More cores!” they yell, while software chugs single-threaded. It’s like selling a 16-lane highway but paving one lane.
My unique angle? This echoes the GPU revolution. Ten years ago, shaders were sequential slop; CUDA/OpenCL made parallelism default for graphics. Result? Games leaped forward. Multi-core by default could do the same for CPUs — but for everything. Prediction: indie game engines adopt first, then servers. By 2027, single-thread perf benchmarks die.
Skeptical? Fair. What about Amdahl’s law — that 5% serial bit? Fleury shrugs: make it shrinkable. Profile-guided decomposition at runtime.
Short para. Boom.
Then sprawl: Hardware’s ready — Zen 5, Arrow Lake pack 32 threads — but devs aren’t. Training wheels needed. Fleury’s vision trains us: write declarative, let runtime imperative-ize across cores. Tools like his profiler visualize the graph, show steals and migrations. It’s intoxicating.
Corporate hype check: None here. Fleury’s indie, no VC fluff. Pure engineering lust.
Is Multi-Core By Default Ready for Prime Time?
Roadblocks loom. Determinism? Fleury opts for statistical fairness over strict order — good enough for 99% cases, debug with seeds. Real-time? Extensions for it.
Energy sip: idle cores power down; active ones sip less per-task.
One para. Punch.
Adoption? Start small — numerical libs, servers. Zig’s foothold helps. But expect pushback: “My app’s I/O bound!” Sure, but CPU bursts still matter.
Here’s the thing — this flips the mental model. From “how do I thread this?” to “prove it’s not parallelizable.”
🧬 Related Insights
- Read more: Transformers: The Engine Under GPT’s Hood, Minus the Hype
- Read more: HBM4 Widens the Pipe—Memory Wall Shifts Right
Frequently Asked Questions
What is multi-core by default?
Ryan Fleury’s approach to programming where runtimes automatically parallelize code across CPU cores, making multi-threading the norm without manual effort.
How does multi-core by default improve performance?
By defaulting to task-based parallelism with smart scheduling, it utilizes all cores for speedups of 4-16x on multi-core hardware, especially for compute-heavy tasks.
Will multi-core by default work in existing languages like C++ or Rust?
Not natively yet — needs runtime/library support, but Fleury prototypes in Zig with ports possible via new compilers or shims.