Large Language Models

Claude Mythos Finds 27-Year Bugs

Twenty-seven years. That's how long a denial-of-service bug hid in OpenBSD's TCP stack, shrugging off reviews and audits. Then Claude Mythos showed up and nailed it cold.

Claude Mythos AI analyzing OpenBSD code for hidden bugs

Key Takeaways

  • Claude Mythos uncovers 27-year-old OpenBSD bug and 16-year FFmpeg flaw missed by humans and 5M fuzz tests.
  • Partners like AWS, Google, Apple fund it heavily — signaling big defense plays ahead.
  • No public release: Anthropic controls access to stay ahead of attackers.

A bug festering in OpenBSD for 27 damn years.

Nobody caught it. Not through code reviews, security audits, version bumps. Zilch. Then FFmpeg: 16 years of a sneaky out-of-bounds write, surviving 5 million fuzzing runs like it was invincible.

Enter Claude Mythos Preview from Anthropic. This AI model reads the code, no hints, no hand-holding — and bam, vulnerabilities pop out. Project Glasswing, they call it. Sounds fancy, but let’s cut the spin: it found stuff humans missed for decades.

That OpenBSD flaw? Signed integer overflow in the TCP SACK code, from 1999. Remote DoS potential. The kind that laughs at pair programming and late-night merges.

Who’s Cashing In on These ‘Discoveries’?

Here’s the cynical vet take: follow the partners. AWS, Apple, Google, Microsoft — all in. Competitors hugging it out over AI bug-finding. Why? Because if attackers get this tech first (spoiler: they will), the defense scramble pays dividends. Anthropic’s dangling $100M in credits, $2.5M to Linux Foundation. Nice PR. But who’s really bankrolling the open-source rescue? These giants aren’t charities; they’re prepping moats.

Mythos didn’t just spot bugs. It chained Linux kernel vulns into a full priv-esc path — stack canaries smashed, KASLR dodged, W^X bypassed. FreeBSD? 17-year-old NFS RCE, unauthenticated root. No human nudge required.

Firefox 147 got hammered: 181 JS shellcode exploits. Previous champ, Claude Opus 4.6? Managed two. Oof.

“I’ve found more bugs in the last couple of weeks than I found in the rest of my life combined.” — Nicholas Carlini, security researcher

That’s not hyperbole. Carlini’s no rookie. Mythos reverse-engineered binaries, rebuilt sources, pentested like a pro. Over 1,000 critical vulns across OSes and browsers. Humans validated 198: 89% severity spot-on.

Benchmarks? Brutal.

Benchmark Claude Opus 4.6 Claude Mythos Preview
CyberGym 66.6% 83.1%
SWE-bench Verified 80.8% 93.9%
SWE-bench Pro 53.4% 77.8%
Terminal-Bench 2.0 65.4% 82.0%

+25 points on SWE-bench Pro. That’s no tweak; it’s a sledgehammer.

CyberGym’s real-world cyber ops sim from Berkeley. Not playground CTFs.

But wait — my unique angle, absent from Anthropic’s glossy drop: this echoes the fuzzing boom of the 2010s. AFL, libFuzzer hyped as bug-killers. Billions of runs, yet FFmpeg laughed. Mythos? Same promise, fancier wrapper. Remember Heartbleed? Two years post-release, humans missed it. Tools evolved, but attackers always one step ahead. Prediction: Mythos sparks an AI security arms race. Defenders get it now; red teams clone it tomorrow. Window from zero-day to exploit? Already minutes, per Linux’s Greg Kroah-Hartman.

Does Claude Mythos Actually Beat Fuzzers and Humans?

Short answer: on these tests, yeah. It builds ROP chains (20+ gadgets), JIT sprays, multi-vuln chains. Autonomous. Black-box to boot.

Long answer — skepticism kicks in. Benchmarks are Anthropic’s playground. CyberGym? They probably tuned for it. SWE-bench? Coding tasks, sure, but real kernels fight dirty with ASLR randomization, custom allocators. And those 5 million FFmpeg fuzzes? What corpus? Mythos wins on directed analysis, but scale it to Chromium’s 30M lines — does it choke?

Partners absent? OpenAI. Politics? Or Sam Altman laughing last with o1-preview’s quiet wins?

No general release. Preview only. Smart — control the narrative, fund the faithful. OpenSSF, Apache get cash. Real funding, sure. But it’s defense-first. Attackers? They’ll fine-tune Llama on this paper and go wild.

Look, I’ve covered Valley hype cycles. Deep Blue beat Kasparov — chess unchanged for pros. AlphaGo? Go masters adapted. This? Security’s existential. Greg K-H: “Something happened a month ago… now real reports.” From slop to signal overnight.

Urgency’s legit. Attack surface exploding — IoT, browsers, kernels. Humans can’t keep up; one dev slip, nation-state pwns.

Yet here’s the rub: who pays? Anthropic’s Claude subscriptions spike. Partners bake it into AWS GuardDuty, Google’s Mandiant. Open source? Free audits now, but strings attached later.

Why Should Developers Care About Mythos?

You’re shipping code. Mythos flags what static analyzers miss — logic bombs, race conditions via reasoning. Not just syntax.

But don’t ditch your linter. AI hallucinates (remember early Copilot vulns?). Pair it with humans.

Bold call: by 2026, every kernel commit runs Mythos-like checks. Mandated. Or else.

Funding flows: 40+ orgs access. JPMorgan securing infra. NVIDIA? CUDA vulns next?

Simon Willison flags OpenAI no-show. Watch that schism.

Bottom line — game shifted. Bugs that dodged decades? Toast. But trust the benchmarks? Halfway. Profit the partners? Absolutely.

Frequently Asked Questions

What is Claude Mythos? Anthropic’s AI model specialized in finding software vulnerabilities autonomously, spotting decades-old bugs in OpenBSD, FFmpeg, and kernels.

Does Claude Mythos find unknown vulnerabilities? Yes — it discovers novel ones, chains exploits, and beats benchmarks like CyberGym by 16+ points, even reverse-engineering binaries.

Will Claude Mythos be publicly available? No general release planned; it’s preview access for partners and funded orgs, focused on defensive research.

Sarah Chen
Written by

AI research editor covering LLMs, benchmarks, and the race between frontier labs. Previously at MIT CSAIL.

Frequently asked questions

What is Claude Mythos?
Anthropic's AI model specialized in finding software vulnerabilities autonomously, spotting decades-old bugs in OpenBSD, FFmpeg, and kernels.
Does Claude Mythos find unknown vulnerabilities?
Yes — it discovers novel ones, chains exploits, and beats benchmarks like CyberGym by 16+ points, even reverse-engineering binaries.
Will Claude Mythos be publicly available?
No general release planned; it's preview access for partners and funded orgs, focused on defensive research.

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by Dev.to

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.