On-Device AI Roguelike in Unity on Android

8 minutes. 43 seconds. That’s all it took for a Samsung Galaxy S24 Ultra to birth a dungeon’s worth of JSON — floors, mobs, HP, attacks — straight from an on-device LLM.

Imagine your pocket rocket, that sleek slab of Snapdragon silicon, churning through 150 tokens of roguelike madness without phoning home to the cloud. No latency hiccups. No data slurped by corps. Just pure, local AI wizardry powering a Unity-built RPG. We’re talking Phi-4-mini, Microsoft’s 3.8B beast, squeezed into an Android game loop.

And here’s the kicker — it actually worked. Valid JSON. Korean mob names included (prompt glitch, but who cares?).

Why Cram a Language Model into a Roguelike?

Look, roguelikes thrive on procedural chaos — endless replays, brutal randomness. But what if that chaos got smart? An LLM whispering enemy placements, boss stats, even flavor text, all tailored to your run. On-device means it’s instant, private, yours. No API keys begging for mercy.

This dev didn’t mess around. Chose Phi-4-mini because Microsoft’s ONNX drop is plug-and-play — 4.9GB quantized, slots into 12GB RAM like it was born there. Smaller models? They crumble on JSON formatting, spitting garbage instead of arrays.

“[LLM] Generated in 181.4s (150 tokens max) JSON came out: [ {“floor”:1,”mob”:”게으른 빵집 아들”,”hp”:50,”atk”:10}, {“floor”:4,”mob”:”elite”,”hp”:100,”atk”:20}, {“floor”:5,”mob”:”boss”,”hp”:200,”atk”:40} ]”

Boom. Structured output. Even if the mob names echo the player’s (fix later, dude).

ONNX Runtime? Cross-platform godsend. Unity C# bindings via asus4’s package. Flip accelerators — QNN for Snapdragon NPU, NNAPI, CoreML — one code line. Unity itself? 2D roguelike heaven, Android/iOS builds, C# everywhere. No Python shame.

Min SDK 31 seals it — Android 12 unlocks vendor libs for QNN’s libcdsprpc.so. Drop lower? Kiss NPU goodbye.

Can Your Flagship Phone Run a 3.8B Model in a Game?

Short answer: Barely. S24 Ultra’s Snapdragon 8 Gen 3 boasts Hexagon NPU, 12GB RAM. Dev pushed the 4.9GB model via adb (94 seconds, sweet). APK stays lean.

But numbers? Mac Editor (CPU): 246s, 0.45 tok/s. S24 CPU-only: 523s, 0.21 tok/s. QNN HTP? 490s, 0.31 tok/s. Phone’s 2.1x slower than Mac. NPU? Meh bump.

Logs spilled the tea: “Failed in loading stub: dlopen failed: library ‘libcdsprpc.so’ not found.” QNN registers, but backend ghosts. NPU promise, CPU delivery.

Here’s my hot take — unscripted, original: This mirrors 2007’s iPhone. GPUs were toys then; Jobs’ metal API unlocked mobile gaming explosions (Angry Birds, Infinity Blade). NPUs are there now. Clunky, half-baked, but give devs six months? On-device AI games go from gimmick to genre. Imagine No Man’s Sky procedural galaxies, but per-device, per-run, LLM-fueled. Cloud? Obsolete for indies.

The Gory Build Breaks — And Fixes That Saved the Day

Dev life: 90% wrestling daemons.

Tokenizer? No C# tiktoken for Phi. Wrote one from tokenizer.json — 200k vocab, merges as arrays (not strings, oops), GPT-2 byte hacks, BPE cache.

KV cache decoding: 32 layers, 8 heads, 128 head_size. Prefill prompt, decode tokens. Tensors rebelled — DenseTensor ctor args flipped.

model.onnx path? ../../.. became ../..

CompressReleaseAssets? 5GB StreamingAssets smashed Java’s 2.1GB array. Nuked Gradle cache (15GB!), exiled model outside Assets. adb push to app files.

Fonts? TMP’s LiberationSans skips Korean. Converted AppleSDGothicNeo.ttc — Custom Range: 32-126,44032-55203,12593-12686 (decimal Hangul hell).

Newtonsoft.Json cast? JValue to JArray flop — merges were arrays, not strings.

It all clicked. Dungeon JSON flows.

Why Does On-Device AI Matter for Game Devs?

Energy surge here. We’re at the platform shift — AI as the new OS layer. Unity plugins today, tomorrow every engine ships NPU hooks. Phones pack 40+ TOPS NPUs; desktops lag.

Corporate spin? Qualcomm hypes Hexagon, but real-world? CPU fallback city. Samsung? Test on Ultra first — if it flops here, peasants’ phones laugh. But benchmarks lie; dev logs don’t.

Prediction: 2025 sees roguelikes — nay, full RPGs — with LLM companions. Your NPC banters evolve mid-fight. Dungeons remember your cheese strats. All local. Battery sip? Optimize prompts, quantize harder, boom.

Skeptics whine “too slow.” 8:43 per gen? Turn-based roguelike laughs. Async it between turns. Players won’t notice.

This isn’t hype. It’s the spark. Like id Software cramming Quake onto 486s — constraints breed genius.

🧬 Related Insights

Read more: GitHub Copilot’s New Appetite: Devs’ Code Snacks Fuel Smarter AI
Read more: One Dev’s Redis Caching Hack Doubled Backend Speed—Here’s the Real Scalability Math

Frequently Asked Questions

How do I run Phi-4-mini in Unity on Android? Grab ONNX cpu_and_mobile from HF, asus4/onnxruntime-unity NPM, SDK 31+, adb model push. Write tokenizer if needed. Test on S24 Ultra tier.

Is Snapdragon NPU acceleration ready for games? Not yet — QNN often fails lib loads, falls to CPU. But it’s progress; expect fixes in Android 15 updates.

Will on-device AI replace cloud for mobile games? For procedural gen like roguelikes? Absolutely, soon. Privacy, speed, offline wins big.

On-Device AI Roguelike in Unity on Android

Key Takeaways

Why Cram a Language Model into a Roguelike?

Can Your Flagship Phone Run a 3.8B Model in a Game?

The Gory Build Breaks — And Fixes That Saved the Day

Why Does On-Device AI Matter for Game Devs?

🧬 Related Insights

Frequently asked questions

Worth sharing?

⚡ Key Takeaways

Why Cram a Language Model into a Roguelike?

Can Your Flagship Phone Run a 3.8B Model in a Game?

The Gory Build Breaks — And Fixes That Saved the Day

Why Does On-Device AI Matter for Game Devs?

🧬 Related Insights

Frequently asked questions

Share this article

Worth sharing?

Related Stories

Stay in the loop

Key Takeaways