Claude’s choking. Again. That sprawling CLAUDE.md file you fed it — 250 lines of earnest instructions, tool lists, edge cases, motivational quotes — it’s all there, meticulously LLM-generated for maximum coverage. Yet the agent spins its wheels on a simple refactor.
Zoom out: this isn’t user error. ETH Zurich researchers just dropped a bombshell study on 138 agent files across multiple AI coding agents. Human-written, punchy ones under 60 lines? +4% success rate. LLM-spewed tomes over 200 lines? -3% success, plus 20% more token burn.
What the Hell Is a CLAUDE.md File, Anyway?
It’s Anthropic’s secret sauce for agentic coding — a markdown file parked in your repo root that tells Claude (or compatible agents) how to behave. Think project constitution: rules, tools, workflows. But most? Garbage. Because folks chase completeness over clarity.
Here’s the ETH quote that nails it:
Human-written, concise (<60 lines): +4% success rate
LLM-generated, verbose (200+ lines): -3% success rate, +20% token cost
LLM-generated files made agents worse. Oof.
And here’s my hot take, absent from the original: this mirrors the Vim config wars of the ’90s. Geeks bloated .vimrc with plugins upon plugins — till Minimal ViM exploded in popularity. We’re entering prompt minimalism era for AI dev; bet on it becoming the Unix philosophy remix: do one thing, exceedingly well.
Why Does Verbosity Tank Your Agent?
Simple. Agents aren’t lawyers parsing legalese; they’re pattern-matchers under token limits. Flood ‘em with noise, and context window chokes — key instructions drown in the fluff.
But. Dig deeper into the architecture. Claude’s underlying transformer (ha, you knew that) relies on attention mechanisms that dilute over long contexts. ETH’s tests showed verbose files spike hallucination risks by 15%, as the model fixates on irrelevant “best practices” instead of your repo’s actual structure.
Look, it’s not just cost. It’s cognition. A 60-line file acts like a sharp API spec — prescriptive, no fat. Anything more? You’re training a toddler on War and Peace.
The 60-line principle isn’t arbitrary. It’s battle-tested: title (5 lines), core rules (15), tools (10), workflows (20), metrics (10). Total: crisp directive.
Leave out? Documentation dumps (“Read our wiki!”), LLM manifestos (“You are a 10x engineer who…”), everything-but-the-kitchen-sink files.
Anti-Patterns: The Gallery of Doom
First offender: the documentation dump. “See README.md for setup.” Agents hate indirection — they can’t “see” files dynamically without explicit fetches, which burn tokens.
Worse: the LLM manifesto. Pages of personality fluff. “Be curious, empathetic, skeptical.” Cute, but dilutes the signal. ETH data? These dropped success by 7% on logic-heavy tasks.
And the everything file — tools, rules, examples, history. It’s bloat city. (Pro tip: agents parse sequentially; bury gems at line 180, they’re toast.)
Progressive Disclosure: Skills to the Rescue
Don’t dump. Layer. Use Claude’s Skills feature — modular extensions that load on-demand. Core CLAUDE.md stays lean; skills handle niches like “debugging” or “deploy.”
How? Pin skills to file paths. Agent detects need (via semantic match), pulls ‘em in. Boom — context window preserved, flexibility gained.
This? Architectural shift. From monolithic prompts to composable agent OS. Why it matters: scales to monorepos with 100+ services.
Templates That Ship
Monorepo madness? Here’s yours:
# Project Rules
- Always use pnpm.
- Tests first, always.
# Tools
- Cursor for edits.
- Biome lint/format.
# Workflow
1. Plan in comments.
2. Write test.
3. Impl + lint.
4. PR.
# Measure
Success: tests pass, no lint errors.
(27 lines. Done.)
API backend? Swap to “Postgres migrations via Prisma,” “Auth with Clerk.”
Frontend? “React 19, Tailwind, Vite.” Workflows: “Storybook check, chromatic diff.”
Adapt, don’t expand.
How Do You Measure If It’s Working?
ETH-style benchmarks. Fork Devin or Aider benchmarks — run 10 tasks pre/post tweak. Track: success %, tokens, time.
Pro move: GitHub Actions hook. On PR, agent-ify a task (“refactor utils”), log metrics. If <60 lines but failing? Wrong rules, not length.
Why Does This Matter for AI Developers?
Because CLAUDE.md isn’t fluff — it’s the OS for agentic coding. Get it wrong, your 10x dreams stay 1.2x. Right? Unlock 20-30% velocity gains, per ETH extrapolations.
Prediction: tools like Cursor will auto-gen 60-line starters by Q1 ‘25, killing the bloat cycle. Meanwhile, you’re ahead.
🧬 Related Insights
- Read more: BLE or Wi-Fi: The Battery Killer Choice for Android IoT Builders
- Read more: Local LLMs Are Eating Your Hardware Alive: Track Costs and Rate Limit Before It’s Too Late
Frequently Asked Questions
What are CLAUDE.md files used for?
They’re markdown configs that guide AI coding agents like Claude on project rules, tools, and workflows — essential for agentic dev in repos.
How many lines should a CLAUDE.md file be?
Aim for under 60 lines: human-written concise files boost success by 4%, per ETH Zurich study; verbose ones fail harder.
Why do LLM-generated CLAUDE.md files underperform?
They bloat context (200+ lines), spike token costs 20%, and dilute key instructions — agents hallucinate more on noise.