theAIcatchup

METR logarithmic chart plotting AI models' software task performance against human completion time

Claude Opus 4.6 Tackles 12-Hour Coding Marathons—But the Metrics Are Wobbling

Imagine an AI chewing through software tasks that bury humans for 12 hours straight. That's Claude Opus 4.6 on the METR chart—but the numbers' confidence interval spans 5 to 66 hours. Is progress exploding, or just a mirage?

5 min read 1 month ago

#metr-chart

Claude Opus 4.6 Tackles 12-Hour Coding Marathons—But the Metrics Are Wobbling