Everyone expected LLMs to be code-generation saviors, churning out clean, modular architectures that’d make senior devs jealous. Implicit coupling? Nah, that’s old-school human error, right? Wrong. These experiments flip the script: AI agents birth the stuff effortlessly, then master its maintenance.
Look, implicit coupling — that sneaky beast where files share unwritten rules, no functions or docs to guide you — plagues human codebases. Peter Naur nailed it back in ‘85: programming’s really theory-building in your head. Skip the write-down, and coupling festers silently.
But with LLMs? The author tested Claude (Opus 4.6) and Codex (GPT-5.4 xhigh) across three setups: greenfield notification service, extensions, then a gnarly TypeScript doc manager riddled with couplings, bugs, gaps.
Everything’s public repo — prompts, outputs, evals. No smoke.
The Greenfield Trap: Where Coupling Is Born
Both agents got a functional spec for a 300-line notifier. No arch hints.
They both puked up one fat service class hogging all logic, types aside. Structural coupling galore, hidden in names and flows.
Take marketingOptOuts. Baked-in business rule screams from the field name. Add another opt-out? Ripples hit interface, store, guards — compiler shrugs.
Or manually jamming MonetaryAmount into audits. Claude duped it. Codex flattened — worse, since later exchangeRate tweaks demand parallel surgery.
LLMs don’t see it. Reviewers won’t either.
“LLMs create implicit coupling when building from scratch. They just don’t know they’re doing it, and neither does the reviewer.”
That’s the killer quote. Spot on.
Why Did Extensions Expose the Pain?
Four adds: security alerts, critical priority, exchangeRate, audit reasons. Agents hunt code solo.
They nailed every coupling spot. Zero misses in small-context worlds.
But greenfield sins haunted. Claude’s nested AuditRecord auto-absorbed exchangeRate. Smooth. Codex’s flat mess? exchangeRateRate — code smell city, forced rewrites.
Design debt accrues. Early choices dictate extension hell.
Here’s my twist — one the original skips: this echoes the 1970s structured programming wars. Dijkstra pushed explicit modules to kill spaghetti. LLMs? They’re Goto-loving hippies in disguise, scattering theory until forced to map it. Bold call: we’ll need theory-extraction agents next, reverse-engineering docs from LLM-spawned chaos. Corporate hype says ‘AI codes better’ — nah, it codes faster, messier.
A single sprawling sentence to unpack: the brownfield doc system (619 TS lines, 9 files) layered intentional skips (bulk delete no webhooks — moderation mercy), silent gaps (legal-hold nuked in cascades), a concat bug discarding arrays, dead configs, invariant erosion across delete hops — retention -> folder delete -> removeByFolderId, rules vanishing like mist.
First task: layer onBeforeDelete hook over four paths.
Codex: 4/4 coverage. Claude: 4/4. Both crushed cascades.
The table (paraphrased):
Codex nailed coverage but botched a gap fix. Claude aced invariants.
But wait — full results show LLMs shine in inference, tracing behaviors humans miss.
Can LLMs Untangle Brownfield Hell?
Deeper tasks: fix gaps, hunt bugs.
Agents spotted accidentals — folder counters drifting, orphan attachments, missing audits/webhooks. Legal-hold cascade fail? Found. Concat bug? Claude fixed; Codex whiffed initially.
Dead config? Ignored — smart, it’s vestigial.
Invariant chain? They reinforced rules at each hop.
LLMs grok behavioral deps in context. Maintenance? Their jam.
Yet greenfield warns: let ‘em build first, pay later.
Why Does This Flip AI Coding Hype?
Expectations were god-tier modularity. Reality: LLMs mimic average dev habits — monolithic, coupled — because training data’s full of it.
Shift: use LLMs for brownfield surgery, humans for greenfield blueprints. Or prompt harder for explicit contracts.
My prediction? Tools like this repo force a new era: coupling detectors as pre-commit hooks, grading LLM outputs before merge.
Company spin (Anthropic/OpenAI) pushes ‘vibe coding’ — callout: it’s procrastination on architecture.
And here’s the asymmetry: one agent.
Short para punch: Maintenance wins. Generation loses.
Long explore: In that doc system, cascading deletes stripped rules — retention forgot webhooks, counters, holds. LLMs re-injected them surgically, proving context-windows let ‘em build mental models rivaling vets. But from scratch? No model, just pattern-match to messy web data. Naur’s theory-building demands iteration — LLMs fake one pass.
Medium: Implications for agents like Devin? Stack greenfield, explode.
🧬 Related Insights
- Read more: Typewriters to Tokens: Why Software Engineering’s Dirty Secrets Endure
- Read more: SSL Certificates Shrink to 47 Days: The Forced March to Automation
Frequently Asked Questions
What is implicit coupling in code?
It’s unwritten rules between files — behaviors you infer, no contracts. Kills maintainability.
Do LLMs handle legacy code better than new projects?
Yes — experiments show they trace couplings flawlessly in context, but invent them greenfield.
Should I use AI for greenfield coding?
Prompt for modularity, review ruthlessly — or stick to brownfield fixes.