Dynamic languages rule.
Ruby, Python, JS—they’re the sprinters in this 13-Language Claude Code Benchmark. Ruby clocked in at $0.36 a pop, 73 seconds flat. Python? Close behind. JavaScript? Not far off. Static languages? They huffed and puffed, 1.4 to 2.6 times slower, pricier too. A Ruby committer ran the numbers—over 600 trials implementing mini-Git. All open on GitHub. No excuses.
Ruby averaged $0.36 per run at 73.1 seconds, Python came in at $0.38 per run and 74.6 seconds, and JavaScript at $0.39 per run and 81.1 seconds. All three had low variance and passed all tests across all 40 runs.
That’s the raw data talking. Not hype.
Why Do Static Languages Choke Here?
Think about it. Claude Opus 4.6—Anthropic’s beast—churns out code. Task? Simplified Git: init, add, commit, log. Then v2: status, diff, checkout, reset. Custom hash algo to dodge library drama. Fair fight.
Static langs like Rust, Go, C? They lag. Go: $0.50, 101 seconds, wild variance. Rust: $0.54, spread like a bad joke—54 seconds std dev. C? $0.74, bloated at 517 lines vs Ruby’s lean 219. Failures? Rust and Haskell coughed up three duds total. One Rust run: agent gaslit the tests. Hallucination city.
Type checking? Killer. Mypy on Python: 1.6-1.7x slower. Steep on Ruby: 2-3.2x penalty. TypeScript vs JS: $0.62 to $0.39. Not just annotations—model’s brain burns extra tokens wrestling types.
Here’s my twist: this echoes the web’s early days. Remember when Java promised enterprise glory, but JS—loose, dynamic—built the actual internet? Static purists sneered. History laughs last. AI coding? Same script. Prototyping favors the nimble. Ruby’s bias? Sure, Endoh’s a committer. But data doesn’t lie.
Short para. Boom.
Is Type Checking Worth the AI Tax?
Teams love static typing. Catches bugs early, scales big. Fine—for humans. But AI? It’s reasoning on the fly. Types box it in. More tokens, more doubt, more time. Endoh nails it: even 30 vs 60 seconds kills flow in iterative dev. Future models sub-second? Maybe. Till then, dynamic wins sprints.
Critics on Lobsters whine: prototypes ain’t 200 lines. Real ones bigger, static shines. Ecosystem? Libs cut code. Fair. But Endoh stripped deps on purpose—pure language test. Failures aren’t bugs; they’re type systems flexing. Yet in AI gen, flex means friction.
Corporate spin? Anthropic funded six months free Claude Max. Cute. But results hold. No code quality measure—maintainability, runtime? Gaps. Still, for gen speed/cost? Gospel.
Picture this sprawl: you’re hacking prototypes with AI. Static camp pushes Rust for safety. But every iter, you’re waiting—twice as long. Flow breaks. Back to dynamic. That’s the workflow killer. Bold call: expect AI tools to optimize for dynamic first. Static? Catch up or get benched.
Rust fans, don’t @ me. Love the borrow checker. But here? It’s handcuffs.
What About Real-World Hacks?
Scale it up? Tough. Fair benchmark across 15 langs? Nightmare. Endoh admits. But 600 runs? Solid. All logs, code public. Reproduce.
Unique angle: this pokes Big Type’s PR machine. “Static or bust” dogma? Cracking. AI shifts power to languages that let models breathe. Prediction: Python/Ruby shops boom in AI era. Go/Rust? Niche forever.
Dry laugh: C’s 517 lines. Like asking a poet for an essay.
One sentence para.
Deeper dive—adding types tanks speed. Why? Model simulates constraints. Thinks harder. Tokens spike. JS to TS: same lines, 60% cost jump. Ouch.
Endoh counters scale gripes: bigger tests needed, yeah. But dev flow? That 2x gap bites now.
🧬 Related Insights
- Read more: GitHub’s Frantic Fix for Bloated Pull Requests Finally Lands
- Read more: I Built a PII Detection API Without Touching AI—And It’s Faster Than Every Enterprise Tool
Frequently Asked Questions
What languages topped the Claude Code benchmark?
Ruby, Python, JavaScript. Fastest, cheapest, zero fails.
Why were static languages slower in this test?
Type reasoning chews tokens. Higher variance, more fails.
Does this mean ditch static typing for AI coding?
For prototypes? Lean dynamic. Scale later, add types manually.