Coffee gone cold on my desk, screen glowing with parsed logs from ~/.claude/—that’s when I saw it: 187 sessions, 3.3 billion tokens, $6,744 equivalent burn.
Claude Code token usage isn’t what you think. We’ve all been there, firing up Anthropic’s coding agent for bots, automations, side hustles, convinced it’s efficient magic. But this deep dive—built with a Rust CLI called ccwhy—rips the lid off.
And here’s the cynical truth after 20 years watching Valley hype cycles: Anthropic’s raking it in on cache reads, while you’re left tweaking habits to avoid bankruptcy.
Why Claude Code’s Token Bill Sneaks Up on You
97% cache reads. Every turn, Claude re-reads the entire conversation context—like flipping back to page one of War and Peace with each paragraph. Cache is cheap, sure, $1.5 per million tokens, but it dominates.
The controllable slice? A measly 2.8%. Break it down: 92.5% cache creation (CLAUDE.md files, MCP tools, system prompts), 6.6% Claude’s output, 0.9% your input. That’s your use—tiny.
I’ve been using Claude Code heavily for the past month. Building trading bots, automation tools, side projects. I knew I was burning through tokens but never looked at the numbers.
This dev didn’t either, until the CLI hit. Open sourced now—brew install SingggggYee/tap/ccwhy, or cargo install. Runs offline, no keys. Tells you why, not just how much like ccusage does.
Peak hours (Mon-Fri 5-11am PT) guzzled 1.3x more tokens. Bash tools? 40% of calls, stuffing long outputs back in. Subagents? 840 calls, each duplicating full context for a dumb search.
86 sessions ballooned past 30 turns sans /compact, context swelling 2-3x. 35 anomalies at 2-3x burn rate. Sound familiar?
Is 97% Cache Reads Normal for Claude Code?
Maybe for heavy users. But is it sustainable? Here’s my unique take, drawn from the dot-com bust: remember AWS’s early data transfer fees? Devs built empires on EC2, then got hammered on egress costs nobody saw coming. Claude’s cache is today’s hidden tax—Anthropic banks on you not parsing ~/.claude/.
Predict this: within a year, they’ll roll out smarter context compression, or users revolt to open models like Llama. Because who’s really winning? Not you, staring at bills.
Post-analysis fixes worked. /compact after 20 turns. Ditch Agent for grep/glob on codebases. Shift heavy lifts off-peak. Anomalies vanished.
But let’s wander a bit—I’ve seen tools like this before. Back in 2010, heroku logs revealed dyno sleep wastes; devs scripted around it. Same vibe. Claude Code’s powerful, yeah, but raw without scrutiny.
Bash dominance? Brutal. Piping full command outputs into context—why? Tools should summarize, not regurgitate. Subagents duplicating context? Amateur hour. Feels like Anthropic prioritized speed over smarts, cashing in on your oversight.
Who Profits from Your Claude Code Habits?
Anthropic, obviously. Max plan softens the blow (equivalent API cost, not your tab), but patterns scream inefficiency. You’re subsidizing their infra while chasing productivity.
Look, I’ve covered every AI wave—Watson, GPT-3, now this. Buzzword-free truth: token optimization is the new devops. ccwhy’s your SRE for AI bills.
Shared your breakdowns? Comments flood with “my cache is 95% too.” Normal? For now. But expect Anthropic PR spin: “Cache enables magic!” Yeah, and egress built the cloud.
Small changes yield big saves. That 0.9% input? Polish prompts tighter. Output at 6.6%? Guide Claude to brevity.
One anomaly session: 2-3x burn, maybe a loop or bad tool chain. Gone with discipline.
Fixing Claude Code Token Waste Today
Start with ccwhy. Install, run, stare. Then:
- /compact religiously.
- Grep over Agent.
- Off-peak heavy lifts.
- Summarize tool outputs.
Devs report 20-30% drops already. Not revolutionary—pragmatic.
Skeptical vet sign-off: Claude Code’s no silver bullet. Great for prototypes, but scale? Parse your logs, or pay forever.
🧬 Related Insights
- Read more: Philosophy Can’t Breathe a Soul into AI’s Cold Calculations
- Read more: Browser Tools That Let You Build Manga Without Dropping $50 on Software
Frequently Asked Questions
What does ccwhy do for Claude Code?
Parses local ~/.claude/ data offline, breaks down token usage by cache, input, output—shows why your bill’s high.
How to reduce Claude Code token costs?
Use /compact after 20 turns, grep instead of Agent searches, avoid peak hours, summarize tool outputs.
Is 97% cache reads normal in Claude Code?
Common for long sessions, but optimize or it’ll eat your budget—Anthropic profits most.