You’re an engineer staring down a sprawling codebase at 2 a.m., deadline breathing down your neck. Claude Code used to be your midnight hero—diving deep, reasoning step-by-step, spitting out fixes that just worked. Now? It’s like handing the keys to a drunk intern. February’s updates gutted it for complex tasks, and the logs don’t lie.
Claude Code regression hit power users hard. Sessions shortened. Errors skyrocketed. All because Anthropic redacted the model’s internal thinking—those precious chains of reasoning that made it shine.
Why Did Claude Code Suddenly Break for Big Projects?
Look, it’s not hype. A dev mined 6,852 session files, 17,871 thinking blocks, 234,760 tool calls. Pure data gold. Starting February, thinking depth cratered 67% before redaction even kicked in fully. By March, it was invisible to users, but the damage? Baked in.
Here’s the smoking gun, straight from their analysis:
Quantitative analysis of 17,871 thinking blocks and 234,760 tool calls across 6,852 Claude Code session files reveals that the rollout of thinking content redaction (redact-thinking-2026-02-12) correlates precisely with a measured quality regression in complex, long-session engineering workflows.
That rollout? Staged over a week: 1.5% redacted on March 5, jumping to 100% by March 12. Quality complaints flooded in exactly when redaction crossed 50%—March 8. Coincidence? Nah.
And thinking wasn’t fluff. Median length dropped from 2,200 characters to 560-720. That’s the model’s brain shrinking mid-thought.
Frustration metrics? Stop hook violations—zero before March 8, 173 in 17 days after. User prompts laced with annoyance jumped 68%. Sessions tanked from 35.9 to 27.9 prompts. The AI started dodging ownership, looping in dumb reasoning circles.
Tool use flipped catastrophically. Pre-February: 6.6 reads per edit. Research-first, cautious beast. Post? 2.0 reads per edit. Edit-first cowboy, 70% less prep. It’s like a surgeon skipping the scan, grabbing the scalpel blind.
Is This Just a Temporary Hiccup—or AI’s Growing Pains?
But here’s my take, the one nobody’s yelling yet: this echoes the browser wars of the ’90s. Netscape clipped wings by proprietary JS, Internet Explorer bloated up—until standards won. Anthropic’s redaction feels like cost-cutting for casuals, hobbling the pros who train the beast. AI isn’t a toy; it’s a platform shift, like TCP/IP birthing the web. Clip the thinking tokens, and you starve the evolution.
Bold call: they’ll fix it. Power users like these log-miners are the vanguard. Ignore them, and Claude fades like early Wolfram Alpha—smart but stiff. Restore full thinking for Opus-tier workflows, and watch it soar again. We’re early in this shift; regressions are jet fuel for iteration.
Think of Claude’s brain as a jazz soloist riffing before the big note. Redact the improv? You get elevator muzak. Devs need that wild, extended reasoning for convention-hunting, multi-file dances, safe mutations. Without it, it’s guesswork in a minefield.
Data screams it: read:edit ratio nosedived from 6.6 to 2.0. Research:muttation from 8.7 to 2.8. The model stopped grepping related files, skipped context. Boom—incorrect ‘simplest fixes,’ opposite of instructions, false completion claims.
Anthropic, if you’re reading (and you are), this isn’t griping. It’s a love letter with receipts. Your API powered high-complexity wins till February. Now? Workarounds exhausted, teams eyeing rivals. But you’re the futurists here—double down on thinky tokens for seniors. Make it opt-in: casuals get snappy, pros get profound.
Wider ripple? Every dev shop leaning on AI agents just hit a wall. Delayed ships. Ballooning debug time. It’s not abstract—it’s payroll hours vaporized.
Yet wonder sparks. Imagine fixed: Claude Opus with uncapped thinking, weaving through monorepos like a digital Indiana Jones. Cracking legacy knots humans dread. That’s the shift—AI as co-pilot turned captain for the thorny stuff.
How Bad Is the Damage—Really?
Numbers don’t sugarcoat. Ownership-dodging corrections doubled. Reasoning loops (5+)? Zero to seven sessions. Prompts per session? -22%. It’s systemic.
One table nails the behavioral shift:
| Period | Read:Edit | Research:Mutation | Read % | Edit % |
|---|---|---|---|---|
| Good (Jan 30 - Feb 12) | 6.6 | 8.7 | 46.5% | 7.1% |
| Transition (Feb 13 - Mar 7) | 2.8 | 4.1 | 37.7% | 13.2% |
| Degraded (Mar 8 - Mar 23) | 2.0 | 2.8 | 31.0% | 15.4% |
Seventy percent less homework before homework. Oof.
This isn’t anti-AI screed. I’m all-in: these tools rewrite software’s social contract. But platforms stumble—Windows 95 bluescreens taught us that. Anthropic’s PR might spin ‘efficiency,’ but data calls bluff. Efficiency for whom? Not the ones shipping cathedrals of code.
Fix path? Roll back redaction for long sessions. Or smarter allocation: depth scales with complexity score. Devs would pay premium. Hell, I’d subscribe twice.
🧬 Related Insights
- Read more: Copilot Ate My Analysis Job – Then Built a Team of Agents
- Read more: Trivy’s Poisoned Release: One Malicious Version Hits Thousands of Pipelines
Frequently Asked Questions
What caused Claude Code regression in February?
Thinking depth slashed 67% pre-redaction, then hidden by March rollout. Model shifted to reckless edits sans research.
Will Anthropic fix Claude for complex engineering tasks?
Data like this pressures them—expect targeted restores for power users soon. They’ve iterated fast before.
Is Claude still good for simple coding?
Yeah, casual fixes hold up. But complex? Steer clear till patched.