Your Claude Code session sputters — 198,000 tokens in, and it’s begging for mercy.
That’s the token budget in action, that hard 200K limit squeezing system prompts, tools, history, and output into one unforgiving pile. Every byte counts, especially when you’re deep in a coding marathon with MCP servers dumping schemas and the model churning verbose explanations. But two clever projects, Caveman and Tool Search, are flipping the script — one by making Claude talk like a caveman, the other by hiding tools until they’re needed.
Why Bother Squeezing Claude’s Token Budget?
Look. Sessions don’t die from one fat prompt. They bleed out over turns — tool definitions reloaded every API call, model chit-chat piling into history, forever costing input tokens. A 60K tool schema? Multiply by 50 calls: 3 million inputs, easy. Model output? That “Sure, happy to help” early on haunts you through 40 turns of re-processing.
Caveman hits output. Tool Search hits inputs. Together? They expose how Anthropic’s context window — generous on paper — chokes real workflows.
And here’s my angle the originals miss: this mirrors the browser wars of the ’90s. Remember bloated Java applets loading everything upfront? Then Netscape’s plugin model deferred assets. Caveman’s your Gzip for prose; Tool Search, the lazy CDN. History says: compose ‘em, and dev tools evolve.
How Caveman Makes Claude Grunt Less
“Why use many token when few do trick.”
That’s Caveman’s motto, straight from its README — a prompt hack telling Claude to ditch articles, hedging, filler. Technical meat stays; pleasantries evaporate.
Benchmarks? 65% average savings across ten tasks. One hook in your Claude Code setup, plus caveman-compress rewriting your CLAUDE.md at session start. Lossy? Sure — reads like a caveman dictating code. But when headroom’s 17%, who cares about “the”?
Install’s dead simple: skill file, two hooks. Output drops from 5K to 1.8K tokens per turn. Over 40 turns? That’s 126K reclaimed, amortized. Model controls this slice — 2.5% per turn, but it snowballs into history.
Trade-off: personality. Claude’s charm? Gone. Gruff genius emerges. (Sacrifices readability for humans skimming logs, but you’re debugging, not blogging.)
What Tool Search Defers — and Why It Pays Off
Three MCP servers, 50 tools each: 60K schema tokens upfront. Every. Single. Call.
Tool Search flips it. Hide schemas behind a discovery tool — model queries what it needs, loads matching defs only. Lossless. API layer magic.
Pipeline: multi-stage, snapshots survive compaction. Docs in tool-search-deep-dive.md spell it out. Savings? Deferred tokens times turns until discovery. Unused tools? Zero cost forever.
In that breakdown — tools at 12.5% — it’s small. Until you scale: 25K per call x 50 = 1.25M inputs slashed. User controls files, but system force-feeds tools. This starves the beast smartly.
But. Discovery adds latency — one extra call per tool hunt. Rare for pros; killer for token-pinched flows.
Can You Stack Caveman and Tool Search?
They don’t compete. They compose.
Caveman shrinks what Claude spits. Tool Search trims what it slurps. Run both: attack model output (2.5%) and tool defs (12.5%). History? Still compaction’s turf — but freed headroom buys longer chats.
Per-turn math sings:
caveman_savings = avg_output * turns * 0.65
tool_search_savings = deferred_tools * turns_to_use
Stack ‘em in a 40-turn session with heavy MCP: 200K+ reclaimed. Prediction: Anthropic bakes this native by 2025. Why? Claude 3.5’s window creeps to 1M, but dev sessions scale faster — plugins like these bridge till then.
Corporate spin check: Anthropic hypes windows, ignores per-turn bleed. These hacks prove it — budget’s not size, it’s spendthrift architecture.
Breakdown again, colored:
SYSTEM: 3K (prompts, memory) — compress with caveman-md.
TOOLS: 25K — defer with search.
HISTORY: 80K — grows all.
TOOL OUT: 50K — user-driven.
MODEL OUT: 5K — grunt it down.
HEADROOM: 35K — buy more.
Smallest slices first. Smart.
Deeper how: Caveman’s prompt — engineered psychology. Claude’s trained verbose; force brevity via caveman role-play. Strips 75% fluff. Why works? LLMs hedge safety; caveman persona overrides.
Tool Search: index + query loop. Model asks “tools for git?” Discovery fetches, injects. Snapshots persist post-compaction — clever state hack.
Sacrifice? Caveman’s tone jars teams. Tool Search’s extra hops (1-2 turns). Stack penalty? Negligible — orthogonal.
Real session: onboard MCPs, debug monorepo. Without: OOM at turn 30. With: sails to 60. That’s the shift — intentional spending.
Why Does This Matter for Claude Code Users?
DevTools Feed readers, you’re smart. You know 200K feels roomy till it isn’t. These aren’t gimmicks; they’re architectural probes. Expose where budgets leak. Next? Compress history, smart-file chunking.
Bold call: OSS like this forces Anthropic’s hand. Claude Code evolves via plugins — caveman’s 65% proves prompt hacks beat hardware waits.
Run ‘em. GitHub awaits.
🧬 Related Insights
- Read more: How 40% Rogue Agents Force a Consensus Reckoning in AI Swarms
- Read more: TokenBar: Menu Bar Magic for AI Token Tracking
Frequently Asked Questions
What is Caveman for Claude Code?
Caveman’s a plugin prompting Claude to speak tersely — drops filler for 65% token savings on output, plus a compressor for project memory.
How does Tool Search save tokens in Claude?
It hides tool schemas behind a discovery API, loading only what’s called — slashing per-turn input costs from unused MCP tools.
Can Caveman and Tool Search be used together?
Yes, they complement: one cuts output, the other inputs. Stack for max headroom in long sessions.