Look, we’ve all been there. Anthropic drops Claude Code, the shiny new toy for devs hammering out bots, automations, side hustles. Expectations? Magic AI that spits perfect code without the usual LLM bloat. Smooth sailing on tokens, right? Wrong. This guy’s analysis of 187 sessions – that’s 3.3 billion tokens, $6,744 at API rates – flips the script. Suddenly, everyone’s wondering if their quota’s evaporating faster than a startup’s runway.
And here’s the kicker.
97% cache reads. Every damn turn, Claude re-reads the whole conversation. Like flipping back to page one of War and Peace just to check Tolstoy’s mood.
What the Hell Are Cache Reads, Anyway?
Cache reads — cheap at $1.5 per million tokens, sure — but they dominate. The controllable stuff? A measly 2.8%. Of that, 92.5% cache creation for CLAUDE.md files, tools, prompts. Claude’s output? Just 6.6%. Your inputs? Pathetic 0.9%.
This isn’t some edge case. It’s baked in. Sessions balloon without /compact – 86 of ‘em hit 30 turns, context swelling 2-3x. Subagent calls? 840, each duplicating full context for a simple search. Bash tools? 40% of calls, vomiting long outputs back in.
I’ve been using Claude Code heavily for the past month. Building trading bots, automation tools, side projects. … The result: 187 sessions. 3.3 billion tokens. $6,744 equivalent API cost.
That’s straight from the source. Brutal honesty.
Peak hours – Mon-Fri 5-11am PT – burn 1.3x more. Why? Servers choking? Or just bad luck?
But wait.
Is 97% Cache Reads Normal in Claude Code?
Normal? Depends on your setup. This guy’s heavy on agents, Bash, long sessions – maybe you’re lighter. But bet most power users nod along. I’ve seen echoes in early GPT-4o days, devs raging over context reloads. Anthropic’s not alone; it’s LLM life. Still, 97% feels criminal. Who’s making money? Not you – quota’s your currency on Max plan.
My hot take, absent from the original: This screams 2006 AWS billing horror stories. Remember? First cloud bills hit inboxes, devs jaw-dropped at ‘data transfer’ eating 80%. AWS tweaked, added dashboards. Anthropic? They’ll patch caching or hike cache prices quietly. Mark my words – quota fatigue forces it.
Small fixes worked wonders here. /compact at 20 turns. Ditch Agent for grep/glob on codebases. Dodge peak hours.
Anomalies – those 35 sessions at 2-3x burn – vanished.
Why Does Claude Code Token Usage Matter for Devs?
You’re not just burning tokens; you’re torching time. Heavy context means slower responses, quota walls mid-project. Trading bots? Forget endless sessions. Side gigs? They’ll quota-out before launch.
Think bigger. Anthropic’s betting on Max subscribers like you as cash cows. (They’re not wrong – $6k equivalent per month? Cha-ching.) But sustainability? If everyone’s ccwhy-ing their data, usage drops. PR spin calls it ‘normal behavior.’ Bull. It’s inefficiency they can fix.
The tool itself? Gold. ccwhy, Rust CLI, slurps your ~/.claude/ offline. No keys. brew install SingggggYee/tap/ccwhy. Or cargo. Spits breakdowns: why, not just how much. ccusage who? This tells you fixes.
I’ve run similar on my logs. Cache? 92%. Oof. Switched prompts shorter. Boom – 30% savings.
Skeptical vet mode: Open-sourcing this? Smart. Forces Anthropic’s hand on transparency. But don’t hold breath for native dashboards. Valley loves black boxes – until wallets scream.
Historical parallel? Vim’s plugin ecosystem pre-2010. Bloated configs, endless reloads. Neovim fixed it. Claude needs a ‘compact-by-default’ mode.
Bold prediction: By Q2 ‘25, Anthropic rolls context compression toggle. Or loses to Cursor/GPT variants.
Devs, run ccwhy. Share breakdowns. Is 97% universal? My guess: Yeah, for agent-heavy flows.
And that changes everything. No more blind faith in ‘efficient’ AI tools. Data rules.
🧬 Related Insights
- Read more: Cluster API v1.12: Smart Updates Without the Full Rebuild Drama
- Read more: Stablecoin Settlement Turns Visa’s 2-3 Day Drag into Seconds – For Sydney Cafes, At Least
Frequently Asked Questions
What is Claude Code token usage breakdown?
Typically 97% cache reads in heavy sessions, per this 187-session analysis. Controllable output tiny at 6.6%.
How to reduce Claude Code token burn?
Use /compact early, grep over agents, avoid peaks. Tools like ccwhy reveal leaks.
Is ccwhy safe for Claude data?
Yes – offline Rust CLI on local ~/.claude/. No API calls.