Claude Token Budget: Caveman vs Tool Search

Your Claude Code session sputters — 198,000 tokens in, and it’s begging for mercy.

That’s the token budget in action, that hard 200K limit squeezing system prompts, tools, history, and output into one unforgiving pile. Every byte counts, especially when you’re deep in a coding marathon with MCP servers dumping schemas and the model churning verbose explanations. But two clever projects, Caveman and Tool Search, are flipping the script — one by making Claude talk like a caveman, the other by hiding tools until they’re needed.

Why Bother Squeezing Claude’s Token Budget?

Look. Sessions don’t die from one fat prompt. They bleed out over turns — tool definitions reloaded every API call, model chit-chat piling into history, forever costing input tokens. A 60K tool schema? Multiply by 50 calls: 3 million inputs, easy. Model output? That “Sure, happy to help” early on haunts you through 40 turns of re-processing.

Caveman hits output. Tool Search hits inputs. Together? They expose how Anthropic’s context window — generous on paper — chokes real workflows.

And here’s my angle the originals miss: this mirrors the browser wars of the ’90s. Remember bloated Java applets loading everything upfront? Then Netscape’s plugin model deferred assets. Caveman’s your Gzip for prose; Tool Search, the lazy CDN. History says: compose ‘em, and dev tools evolve.

How Caveman Makes Claude Grunt Less

“Why use many token when few do trick.”

That’s Caveman’s motto, straight from its README — a prompt hack telling Claude to ditch articles, hedging, filler. Technical meat stays; pleasantries evaporate.

Benchmarks? 65% average savings across ten tasks. One hook in your Claude Code setup, plus caveman-compress rewriting your CLAUDE.md at session start. Lossy? Sure — reads like a caveman dictating code. But when headroom’s 17%, who cares about “the”?

Install’s dead simple: skill file, two hooks. Output drops from 5K to 1.8K tokens per turn. Over 40 turns? That’s 126K reclaimed, amortized. Model controls this slice — 2.5% per turn, but it snowballs into history.

Trade-off: personality. Claude’s charm? Gone. Gruff genius emerges. (Sacrifices readability for humans skimming logs, but you’re debugging, not blogging.)

What Tool Search Defers — and Why It Pays Off

Three MCP servers, 50 tools each: 60K schema tokens upfront. Every. Single. Call.

Tool Search flips it. Hide schemas behind a discovery tool — model queries what it needs, loads matching defs only. Lossless. API layer magic.

Pipeline: multi-stage, snapshots survive compaction. Docs in tool-search-deep-dive.md spell it out. Savings? Deferred tokens times turns until discovery. Unused tools? Zero cost forever.

In that breakdown — tools at 12.5% — it’s small. Until you scale: 25K per call x 50 = 1.25M inputs slashed. User controls files, but system force-feeds tools. This starves the beast smartly.

But. Discovery adds latency — one extra call per tool hunt. Rare for pros; killer for token-pinched flows.

Can You Stack Caveman and Tool Search?

They don’t compete. They compose.

Caveman shrinks what Claude spits. Tool Search trims what it slurps. Run both: attack model output (2.5%) and tool defs (12.5%). History? Still compaction’s turf — but freed headroom buys longer chats.

Per-turn math sings:

caveman_savings = avg_output * turns * 0.65

tool_search_savings = deferred_tools * turns_to_use

Stack ‘em in a 40-turn session with heavy MCP: 200K+ reclaimed. Prediction: Anthropic bakes this native by 2025. Why? Claude 3.5’s window creeps to 1M, but dev sessions scale faster — plugins like these bridge till then.

Corporate spin check: Anthropic hypes windows, ignores per-turn bleed. These hacks prove it — budget’s not size, it’s spendthrift architecture.

Breakdown again, colored:

SYSTEM: 3K (prompts, memory) — compress with caveman-md.

TOOLS: 25K — defer with search.

HISTORY: 80K — grows all.

TOOL OUT: 50K — user-driven.

MODEL OUT: 5K — grunt it down.

HEADROOM: 35K — buy more.

Smallest slices first. Smart.

Deeper how: Caveman’s prompt — engineered psychology. Claude’s trained verbose; force brevity via caveman role-play. Strips 75% fluff. Why works? LLMs hedge safety; caveman persona overrides.

Tool Search: index + query loop. Model asks “tools for git?” Discovery fetches, injects. Snapshots persist post-compaction — clever state hack.

Sacrifice? Caveman’s tone jars teams. Tool Search’s extra hops (1-2 turns). Stack penalty? Negligible — orthogonal.

Real session: onboard MCPs, debug monorepo. Without: OOM at turn 30. With: sails to 60. That’s the shift — intentional spending.

Why Does This Matter for Claude Code Users?

DevTools Feed readers, you’re smart. You know 200K feels roomy till it isn’t. These aren’t gimmicks; they’re architectural probes. Expose where budgets leak. Next? Compress history, smart-file chunking.

Bold call: OSS like this forces Anthropic’s hand. Claude Code evolves via plugins — caveman’s 65% proves prompt hacks beat hardware waits.

Run ‘em. GitHub awaits.

🧬 Related Insights

Read more: How 40% Rogue Agents Force a Consensus Reckoning in AI Swarms
Read more: TokenBar: Menu Bar Magic for AI Token Tracking

Frequently Asked Questions

What is Caveman for Claude Code?

Caveman’s a plugin prompting Claude to speak tersely — drops filler for 65% token savings on output, plus a compressor for project memory.

How does Tool Search save tokens in Claude?

It hides tool schemas behind a discovery API, loading only what’s called — slashing per-turn input costs from unused MCP tools.

Can Caveman and Tool Search be used together?

Yes, they complement: one cuts output, the other inputs. Stack for max headroom in long sessions.

Claude Token Budget: Caveman vs Tool Search

Key Takeaways

Why Bother Squeezing Claude’s Token Budget?

How Caveman Makes Claude Grunt Less

What Tool Search Defers — and Why It Pays Off

Can You Stack Caveman and Tool Search?

Why Does This Matter for Claude Code Users?

🧬 Related Insights

Frequently asked questions

Worth sharing?

⚡ Key Takeaways

Why Bother Squeezing Claude’s Token Budget?

How Caveman Makes Claude Grunt Less

What Tool Search Defers — and Why It Pays Off

Can You Stack Caveman and Tool Search?

Why Does This Matter for Claude Code Users?

🧬 Related Insights

Frequently asked questions

Share this article

Worth sharing?

Related Stories

Claude Code's Virtual Company Hack: I Built It, It Runs My Tasks—But Who's Winning?

Claude Code's Dirty Secret: How I Built a 4,000-Line Trading Bot Without Going Broke

50-Line RAG Hack Slashes Claude Code Tokens 10x on My 22K-File Unity Beast

Anthropic's Claude Code Nightmare: 513K Lines of Source Leaked on npm, Plus a Trojan Twist

Stay in the loop

Key Takeaways