My Samsung Galaxy S24 Ultra sat there, fan whirring like a tiny jet engine, as it spat out a dungeon floor after eight minutes and forty-three seconds.
On-device AI. That’s the hook here — no cloud, no servers, just your phone’s brain grinding away. The dev behind this roguelike RPG experiment isn’t dreaming of world domination. He’s just building, because that’s what tinkerers do when a tech itch hits.
But let’s cut the romance. It’s slow. Phi-4-mini, a 3.8B model squeezed into INT4 quantization, running CPU-only via ONNX Runtime on Android. No NPU magic yet — he’s banging his head against Qualcomm’s QNN HTP wall. Eight minutes for mob names, dialogue, boss patterns. Imagine that mid-game. Player rage-quits before the slime even speaks.
Why Chase On-Device AI for Games?
Games forgive flaws. That’s the pitch.
Games naturally absorb the limitations of small on-device models in a way that most other apps can’t.
Spot on. Weird mob name? Lore flavor. Off-kilter boss banter? Immersive quirk. Roguelikes crave variety — every run’s a crapshoot anyway. Cloud AI shines for perfectionists; this stuff’s for chaos lovers.
Here’s my twist: remember the NES era? Cartridges packed procedural dungeons because storage sucked. Rogue itself ran on university mainframes in ‘80. On-device AI? It’s that spirit rebooted — offline, infinite runs, no quarterly server bills. But smartphones aren’t there yet. NPUs double yearly, sure. Compression tricks evolve. Still, today’s “powerful” phone LLM feels like a 486 running Quake.
And roguelikes? Perfect lab rats. Permadeath hides repetition. Fresh content per floor keeps it addictive. Demon Lord’s Castle, 300 floors deep — AI spits out sets every five levels. Hero quests, hidden events, all local. No phoning home to OpenAI.
But slow inference kills flow. 8:43 per chunk. That’s not playable; it’s a screensaver.
Is On-Device AI Ready for Prime Time?
No. Not even close.
The dev admits it: compared to cloud LLMs, nowhere near. Smartphones chug what GPUs ate for breakfast two years back. Direction’s clear, though — get in now, own the future. Bullish optimism? Or dev-bro hype?
Look, I’ve seen this movie. Flash plugins promised desktop apps forever. Then HTML5 ate them. Mobile GPUs teased native 3D glory; reality delivered Candy Crush clones. On-device AI’s the new shiny — privacy porn, no latency, zero costs. Corporate spin screams it: Apple Intelligence, Gemini Nano. But under the hood? Token-per-second rates that’d make a Raspberry Pi blush.
He’s using Unity + ONNX Runtime Android. Built a C# tokenizer from scratch. KV cache engine. Stuff that breaks spectacularly. Next post promises the autopsy — what went wrong (mostly everything) and what didn’t (a wonky dungeon).
Prediction time, my unique sour note: this stays niche for five years. Indies hacking offline roguelikes. AAA? They’ll cloud-hybrid until NPUs hit desktop parity. Samsung S24 Ultra’s the Ferrari here — most phones? Golf carts.
The setup. Galaxy S24 Ultra, top-tier NPU waiting to be unleashed. Phi-4-mini because it’s small enough to fit, dumb enough to not hallucinate the apocalypse. ONNX bridges the gap — export from Hugging Face, import to Android hell.
Tokenizer woes first. LLMs need token splitting; Unity’s C# doesn’t ship one. Scratch-build it, pray. Then inference loop: KV cache to avoid recomputing context. Miss that, and you’re regenerating the hero’s name every prompt.
NPU dreams dashed. QNN HTP? Qualcomm’s secret sauce for Snapdragon. Docs? A cryptogram. Forums? Echo chamber of pain. CPU fallback it is — hence the glacial pace.
But it works. Dungeon pops: slimes with punny names, bosses monologuing like bad D&D GMs. Charm in the glitches.
What Breaks — And What’s Next?
Implementation’s a minefield.
Unity Android plugins fight ONNX like cats in a sack. Quantization artifacts: words warp, logic frays. Prompt engineering? Critical. “Generate dungeon floor 5: mobs, events, boss” — too vague, get salad. Nail it, get gold.
Future: NPU unlock. Models shrink further — think 1B params at GPT-4 quality. Battery life? The real boss fight. Eight minutes drains 10% easy.
Corporate angle? Samsung pushes this hard. Galaxy AI ads gloss over the slowness. It’s PR spinach — healthy future, tastes like cardboard now.
My take: admirable hack. Teaches more than any tutorial. But don’t bet the farm. Cloud’s king till 2028, easy.
Devs reading this: clone the repo when it drops. Tweak prompts. Chase that NPU high. Rest of us? Watch from the sidelines, popcorn in hand.
Why Does This Matter for Indie Devs?
Indies starve on server costs. Procedural gen’s free lunch — if it runs local.
Roguelike twist amplifies it. Freshness without artists. Dialogue without writers. Scale to 1000 floors? Why not.
Skepticism check: charm fades fast. Players want 60fps, not 8-minute waits. Polish the demo, or it’s DOA.
Historical parallel — Zork’s text parser on 48K machines. On-device AI’s that, graphical. Limits breed creativity. Abuse them, flop.
🧬 Related Insights
- Read more: TypeScript 6.0 Kills ES5, Ushers in Temporal — NgRx Signals Get Smarter for Forms
- Read more: Ingress-NGINX’s Hidden Traps: Five Behaviors That’ll Bite During Kubernetes Migration
Frequently Asked Questions
What does on-device AI mean for roguelike games?
It means offline, infinite procedural dungeons — slow today, snappier tomorrow. Perfect for permadeath runs craving variety.
How to run Phi-4-mini on Android with Unity?
ONNX Runtime plugin, custom C# tokenizer, KV cache. Expect CPU slowness sans NPU; fight QNN docs for speed.
Will on-device AI make cloud LLMs obsolete in games?
Not soon. Latency wins for polish; local shines for privacy and solos. Hybrid rules.