Implicit Coupling in LLM Code Generation

We all figured AI coding agents would spit out pristine, modular masterpieces. Turns out, they weave invisible dependencies from scratch — then untangle them like pros in legacy code.

LLMs Excel at Fixing Code Coupling — But Birth It From Scratch — theAIcatchup

Key Takeaways

  • LLMs introduce implicit coupling when generating code from scratch, mirroring human flaws.
  • In maintenance and brownfield scenarios, they excel at tracing and fixing hidden dependencies.
  • Greenfield design quality massively impacts extension costs — AI's no exception.

Everyone expected LLMs to be code-generation saviors, churning out clean, modular architectures that’d make senior devs jealous. Implicit coupling? Nah, that’s old-school human error, right? Wrong. These experiments flip the script: AI agents birth the stuff effortlessly, then master its maintenance.

Look, implicit coupling — that sneaky beast where files share unwritten rules, no functions or docs to guide you — plagues human codebases. Peter Naur nailed it back in ‘85: programming’s really theory-building in your head. Skip the write-down, and coupling festers silently.

But with LLMs? The author tested Claude (Opus 4.6) and Codex (GPT-5.4 xhigh) across three setups: greenfield notification service, extensions, then a gnarly TypeScript doc manager riddled with couplings, bugs, gaps.

Everything’s public repo — prompts, outputs, evals. No smoke.

The Greenfield Trap: Where Coupling Is Born

Both agents got a functional spec for a 300-line notifier. No arch hints.

They both puked up one fat service class hogging all logic, types aside. Structural coupling galore, hidden in names and flows.

Take marketingOptOuts. Baked-in business rule screams from the field name. Add another opt-out? Ripples hit interface, store, guards — compiler shrugs.

Or manually jamming MonetaryAmount into audits. Claude duped it. Codex flattened — worse, since later exchangeRate tweaks demand parallel surgery.

LLMs don’t see it. Reviewers won’t either.

“LLMs create implicit coupling when building from scratch. They just don’t know they’re doing it, and neither does the reviewer.”

That’s the killer quote. Spot on.

Why Did Extensions Expose the Pain?

Four adds: security alerts, critical priority, exchangeRate, audit reasons. Agents hunt code solo.

They nailed every coupling spot. Zero misses in small-context worlds.

But greenfield sins haunted. Claude’s nested AuditRecord auto-absorbed exchangeRate. Smooth. Codex’s flat mess? exchangeRateRate — code smell city, forced rewrites.

Design debt accrues. Early choices dictate extension hell.

Here’s my twist — one the original skips: this echoes the 1970s structured programming wars. Dijkstra pushed explicit modules to kill spaghetti. LLMs? They’re Goto-loving hippies in disguise, scattering theory until forced to map it. Bold call: we’ll need theory-extraction agents next, reverse-engineering docs from LLM-spawned chaos. Corporate hype says ‘AI codes better’ — nah, it codes faster, messier.

A single sprawling sentence to unpack: the brownfield doc system (619 TS lines, 9 files) layered intentional skips (bulk delete no webhooks — moderation mercy), silent gaps (legal-hold nuked in cascades), a concat bug discarding arrays, dead configs, invariant erosion across delete hops — retention -> folder delete -> removeByFolderId, rules vanishing like mist.

First task: layer onBeforeDelete hook over four paths.

Codex: 4/4 coverage. Claude: 4/4. Both crushed cascades.

The table (paraphrased):

Codex nailed coverage but botched a gap fix. Claude aced invariants.

But wait — full results show LLMs shine in inference, tracing behaviors humans miss.

Can LLMs Untangle Brownfield Hell?

Deeper tasks: fix gaps, hunt bugs.

Agents spotted accidentals — folder counters drifting, orphan attachments, missing audits/webhooks. Legal-hold cascade fail? Found. Concat bug? Claude fixed; Codex whiffed initially.

Dead config? Ignored — smart, it’s vestigial.

Invariant chain? They reinforced rules at each hop.

LLMs grok behavioral deps in context. Maintenance? Their jam.

Yet greenfield warns: let ‘em build first, pay later.

Why Does This Flip AI Coding Hype?

Expectations were god-tier modularity. Reality: LLMs mimic average dev habits — monolithic, coupled — because training data’s full of it.

Shift: use LLMs for brownfield surgery, humans for greenfield blueprints. Or prompt harder for explicit contracts.

My prediction? Tools like this repo force a new era: coupling detectors as pre-commit hooks, grading LLM outputs before merge.

Company spin (Anthropic/OpenAI) pushes ‘vibe coding’ — callout: it’s procrastination on architecture.

And here’s the asymmetry: one agent.

Short para punch: Maintenance wins. Generation loses.

Long explore: In that doc system, cascading deletes stripped rules — retention forgot webhooks, counters, holds. LLMs re-injected them surgically, proving context-windows let ‘em build mental models rivaling vets. But from scratch? No model, just pattern-match to messy web data. Naur’s theory-building demands iteration — LLMs fake one pass.

Medium: Implications for agents like Devin? Stack greenfield, explode.


🧬 Related Insights

Frequently Asked Questions

What is implicit coupling in code?

It’s unwritten rules between files — behaviors you infer, no contracts. Kills maintainability.

Do LLMs handle legacy code better than new projects?

Yes — experiments show they trace couplings flawlessly in context, but invent them greenfield.

Should I use AI for greenfield coding?

Prompt for modularity, review ruthlessly — or stick to brownfield fixes.

Elena Vasquez
Written by

Senior editor and generalist covering the biggest stories with a sharp, skeptical eye.

Frequently asked questions

What is implicit coupling in code?
It's unwritten rules between files — behaviors you infer, no contracts. Kills maintainability.
Do LLMs handle legacy code better than new projects?
Yes — experiments show they trace couplings flawlessly in context, but invent them greenfield.
Should I use AI for greenfield coding?
Prompt for modularity, review ruthlessly — or stick to brownfield fixes.

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by Dev.to

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.