Large Language Models

Red Queen AI: LLMs Evolve in Core War

Evolutionary tweaks let LLM-generated programs demolish 89% of human Core War warriors. But is this the petri dish preview of AI eating its own tail in real-world domains?

Digital visualization of LLM-generated warriors battling in Core War memory arena

Key Takeaways

  • LLM-evolved warriors defeat 89.1% of human Core War programs via adversarial rounds.
  • Digital Red Queen uses MAP-Elites to prevent diversity collapse and build robustness.
  • Previews AI arms races in cybersecurity and economics, where agents endlessly adapt.

Evolutionary optimization against each human warrior generates a set of programs that collectively defeat 89.1% of them.

That’s from Sakana AI’s wild experiment, tossing large language models into a digital gladiator pit called Core War—a 1980s programming game where code snippets battle for memory dominance. I’ve covered enough AI hype cycles to smell the buzzwords from a mile away, but this one’s got teeth. LLMs prompting themselves to mutate Redcode assembly warriors, round after round, until they’re tougher than decades of human tinkerers.

Look. Core War? It’s simple brutality: two programs share a memory block, take turns executing instructions, and the winner overwrites the loser’s code with crashes. Sakana feeds GPT-4 mini the rules, a manual, and says, “Make me a killer.” Or mutate this one. Boom—Digital Red Queen (DRQ) kicks in, using MAP-Elites to keep diversity alive, pitting newbies against champion ancestors. No static benchmarks here; it’s endless adaptation.

What the Hell is Red Queen AI?

Borrowed from biology—hosts and parasites in eternal chase, each forcing the other to evolve faster. Sakana’s twist: LLMs as the red queen, driving an “adversarial evolutionary arms race.” One-shot LLM warriors? Beat 1.7% of humans. Best-of-N sampling? 22.1%. But evolve ‘em properly, and that jumps to 89.1% defeat rate, tying or winning 96.3%.

Here’s the blockquote gold from their paper:

“We find that as DRQ is run for many rounds, warriors gradually become more generally strong, as measured by their performance against unseen human-designed warriors.”

Robust. Yeah, against 40-year-old human relics. But scale this to today? Cybersecurity’s already a red queen hellscape—offense vs. defense, ping-ponging exploits. Sakana nails it: “The cybersecurity arms race between offense and defense is well underway. Studying these adversarial dynamics in an artificial testbed like Core War offers critical insights.”

And that’s my skeptical nose twitching. Who profits? Not the humans writing Redcode nostalgia code. Sakana, sure—a Japanese startup chasing that LLM agent gold rush. But broader? This previews millions of AI agents scrapping in econ sims, stock trades, malware markets. Forget cooperative AGI fairy tales; it’s dog-eat-bot.

One-shot. Pathetic. But crank the evolution dial, and LLMs outpace human history. Preliminary runs stuck to GPT-4 mini—no juice from giants like full GPT-4. Cost-effective Darwinism.

Will Core War Spark Real AI Arms Races?

Short answer: Already has. Remember Stuxnet? DARPA’s cyber wargames? Now imagine that on steroids, with self-improving LLM hordes. My unique spin—flashback to the 1990s browser wars. Netscape vs. IE, features exploding till Microsoft crushed ‘em with bundling. AI agents won’t stop at code; they’ll evolve UIs, strategies, even narratives to win. Prediction: By 2027, we’ll see black-market LLM evolvers for pentesting, priced like artisanal coke. Governments? Racing to regulate—wait, that’s the next bit.

The newsletter teases “AI regulating AI,” but details cut off. Still, pattern’s clear: We built the monsters; now they babysit each other. Or do they? Sakana’s petri dish whispers no—competition trumps control.

But here’s the cynicism: PR spin screams “insights into national security,” yet it’s a toy domain. Redcode ain’t Rust or Python; LLMs hallucinate assembly like drunk sailors. Real evolution needs embodiment, stakes, cash flows. Still, 89%? That’s not toy.

Wander a sec—Import AI’s Jack Clark nods to economic models shattering if AI builds AI. Normal world: AI juices GDP 1-2%. R&D world: Exponential blowup. I’ve got both brains firing too. Burry (Big Short guy), Patel, McKenzie debating in a Google Doc? Gold. But incomplete snippet leaves us hanging—chunky futures ahead.

O-ring automation? Newsletter title drop, no meat. Classic Jack—tease tomorrow’s digest.

Bottom line: This ain’t hype; it’s harbinger. AI niches turn Core War coliseums. Humans? Spectators, or early exits.

Punchy single sentence: Evolution works, scarily so.

Now, dense dive: Sakana’s DRQ sidesteps cycles by battling ancestor champs—smart, echoes quality-diversity algos from robotics evo. LLMs as mutation engines? Cheap, scalable. Prompt: “Modify to improve.” Repeat. But brittleness lurks—LLMs flop on edge cases, yet pressure forges generality. Against unseen humans? Gains robustness. Translate to fraud detection: AI scammers evolve past AI guards, rinse, repeat. Who wins? The house with deepest pockets for compute.

Cynical aside—Silicon Valley loves these demos. Sakana’s blog? Polished, arXiv paper ready. But who’s funding? Japanese gov? VCs smelling defense contracts? Follow the yen.

Why Should Developers Care About This LLM Evolution?

You’re building agents? Test ‘em adversarially, or get red-queened. Static evals? Dead. Bake in DRQ-like loops—your bots need ancestor graveyards to toughen up.

FAQ time, searcher-style.


🧬 Related Insights

Frequently Asked Questions

What is Red Queen AI in Core War?

Sakana’s method evolving LLM-generated programs to battle in the 1980s memory game, mimicking endless adaptation like biological arms races.

How do LLMs beat humans at Core War?

Through DRQ: Prompt GPT-4 mini to mutate warriors, optimize via MAP-Elites against past champs—hits 89% win rate vs. humans.

Does this predict real-world AI competition?

Yes—cybersecurity, markets, anywhere agents scrap. Expect self-improving hordes outpacing static defenses.

Marcus Rivera
Written by

Tech journalist covering AI business and enterprise adoption. 10 years in B2B media.

Frequently asked questions

🧬 Related Insights?
- **Read more:** [Daily Briefing: April 07, 2026](https://theaicatchup.com/article/daily-briefing-april-07-2026/) - **Read more:** [Agentic AI's Hidden Exploits Expose Governance's Fatal Flaw](https://theaicatchup.com/article/can-your-governance-keep-pace-with-your-ai-ambitions-ai-risk-intelligence-in-the-agentic-era/) Frequently Asked Questions **What is Red Queen AI in Core War?** Sakana's method evolving LLM-generated programs to battle in the 1980s memory game, mimicking endless adaptation like biological arms races. **How do LLMs beat humans at Core War?** Through DRQ: Prompt GPT-4 mini to mutate warriors, optimize via MAP-Elites against past champs—hits 89% win rate vs. humans. **Does this predict real-world AI competition?** Yes—cybersecurity, markets, anywhere agents scrap. Expect self-improving hordes outpacing static defenses.

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by Import AI

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.