AI Hardware

HBM4 Roadmap: Custom Dies & AI Memory

AI folks banked on stacking more HBM layers to feed ravenous models. Nope—custom base dies, shoreline squeezes, and vendor drama are flipping the script, for better or worse.

Stacked HBM4 dies with custom base logic on AI accelerator chip edge

Key Takeaways

  • HBM4 introduces custom base dies to boost efficiency and dodge shoreline limits.
  • Nvidia dominates demand, but supply crunches loom with Samsung's struggles.
  • Custom innovations mask deeper issues — expect fragmented supply and higher costs.

Everyone figured HBM’s future was simple: pile on more layers, crank bandwidth, watch AI chug along.

Wrong.

HBM4 drops custom base dies — logic bottoms that turbocharge stacks — plus shoreline expansions to cram memory right up against the chip’s edge. It’s not just evolution; it’s a frantic scramble past the memory wall, where AI’s bit hunger outpaces everything. Nvidia’s Rubin Ultra eyes 1TB per GPU, Broadcom’s TPUs swell, OpenAI tinkers. Demand? Exploding. Supply? A joke.

And here’s the kicker — this isn’t smooth sailing.

What Was Everyone Expecting From HBM?

Straight scaling. More stacks, HBM3E to HBM4, 12-high towers on through-silicon vias (TSVs) that bloat die sizes 85% versus DDR. Vendors like SK Hynix lead, Samsung lags, Micron plays catch-up. Bandwidth soars — ultra-wide buses, 1,000+ wires per stack — demanding fancy 2.5D interposers like TSMC’s CoWoS. AI accelerators? All-in on HBM. No substitutes cut it; DDR5 flops on bandwidth, SRAM skimps on density.

Roadmaps screamed capacity bumps: 288GB per GPU today, 1TB tomorrow. Simple, right? Pump bits, train bigger models, profit. But physics — or packaging — bites back.

Shoreline crunch. HBM hugs two SOC edges only; I/O claims the rest. Vertical stacks help, but capacity caps hit fast. Energy? Latency? Those wide paths guzzle power unless dies kiss the compute core.

Why Does HBM4’s Custom Base Die Actually Matter?

“HBM combines vertically stacked DRAM chips with ultra-wide data paths and has the optimal balance of bandwidth, density, and energy consumption for AI workloads.”

That’s the primer — spot on, but naive. Custom base dies flip it: logic layer at stack bottom handles buffering, PHYs, even compute-offload tricks like KV-cache. No more generic bases; tailor for Nvidia, AMD, OpenAI customs. Samsung qualifies, but whispers say they’re toast — yield woes, China pushback on domestic HBM.

Look, this reeks of desperation. Vendors hoard TSV tools, convert DDR lines at a crawl. Explosive demand — Nvidia owns 2027’s lion share — starves everyone else. Broadcom surges on TPUs, SoftBank/OpenAI side projects nibble. Result? Price premiums stick, shortages loom.

But — and it’s a big but — shoreline expansion saves the day? Repeaters, PHY offloads, LPDDR hybrids, even “beachfront” tricks stretch edges. Compute under memory? SRAM tags? Wild ideas to dodge limits.

One para wonders: is this peak HBM, or prelude to crash?

History rhymes hard. Remember 1980s DRAM wars? Oligopoly formed — Samsung, Hynix forebears — crushed innovators via capacity floods. Now, HBM cartel brews: three vendors, TSMC packaging chokehold. My bold call? Custom dies fragment it — hyperscalers demand bespoke, birthing a vendor split that tanks yields, spikes costs 2x by 2028. Nvidia wins short-term; startups die.

Is Samsung’s HBM Dream Dead?

They’re pushing — qualification ramps, China plants for domestic escape. But subscribers hear the dirt: viability tanks. One tech shift — maybe memory controller offloads — could flip capacity trends, ditching endless stacks for smarter pooling. Disaggregated prefill? Wide high-rank EP? Niche, but hints at post-HBM world.

Supply chain? Upended. HBM bits skyrocket alongside AI ASICs, yet custom everything means no scale. Packaging mainstream now — MR-MUF buzzwords for all. Energy efficiency? Still lags; those TSVs chew juice.

Punchy truth: HBM’s premium holds because nothing else works. Yet.

Vendors dance — SK Hynix dominates, Samsung scrambles, Micron lurks. Accelerators evolve: Nvidia’s aggressive, AMD follows, OpenAI experiments. All chase that balance: capacity sans latency hell.

And the wall? It’s not scaled — it’s circumvented. Custom dies, shoreline hacks — clever, sure. Corporate spin calls it revolutionary. I call bullshit. It’s bandage on a bullet wound; true fix needs photonics or CXL pooling, not this kludge.

Dense dive: manufacturing’s hell. TSVs demand retooled fabs, stacking 13 layers? Yield killers. Back-end packaging? CoWoS bottlenecks TSMC. China domestic? Geopolitics nightmare — US curbs loom.

Short take. HBM rules AI hardware. For now.

Why Does This Matter for AI Builders?

You’re gluing HBM to GPUs. Expect delays, premiums. Rubin Ultra’s 1TB? Dreamy, but shared scarcity hits all. Offloads like KV-cache to base dies? Efficiency win — 20-30% power drop, maybe — but debug hell.

Prediction: by 2027, HBM5 whispers emerge, but HBM4 customs rule. Supply implodes if Samsung folds.

Skeptical wrap: hype masks fragility. AI’s memory feast devours fabs whole.


🧬 Related Insights

Frequently Asked Questions

What is HBM and why is it crucial for AI?

HBM’s stacked DRAM with fat bandwidth buses — perfect for AI’s data deluge, trouncing DDR on speed-density mix.

When will HBM4 hit production and fix shortages?

Samsung qualifies soon, but customs delay mass ship; shortages drag into 2027.

Will custom HBM dies kill off smaller vendors?

Likely — hyperscalers lock in Nvidia/SK Hynix duopoly, squeezing everyone else.

Aisha Patel
Written by

Former ML engineer turned writer. Covers computer vision and robotics with a practitioner perspective.

Frequently asked questions

What is HBM and why is it crucial for AI?
HBM's stacked DRAM with fat bandwidth buses — perfect for AI's data deluge, trouncing DDR on speed-density mix.
When will HBM4 hit production and fix shortages?
Samsung qualifies soon, but customs delay mass ship; shortages drag into 2027.
Will custom HBM dies kill off smaller vendors?
Likely — hyperscalers lock in Nvidia/SK Hynix duopoly, squeezing everyone else.

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by SemiAnalysis

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.