AI Agents Write Production Code in CI

Picture this: your team’s staring down a polished Jira ticket, execution plan locked and loaded. Everyone’s waiting for the usual grind — you or some harried engineer to dive in, hack away, pray the tests pass.

But hold up. What if a CI agent grabs it instead?

That’s exactly what this team built. AI agents writing production code in CI. Full test pyramid. Seven-perspective reviews. Bug hunts baked in. It’s not a toy; it’s a workflow that spits out merge-ready PRs without a human keystroke on the code.

Your team gets a refined Jira ticket with a full Execution Plan. Who implements it? Not you. A CI agent does.

Boom. That’s the hook from their write-up, and damn if it doesn’t hit like a freight train.

We all expected AI to nibble at the edges — autocomplete snippets, debug helpers on your MacBook. Remember those early Copilot demos? Cute, but fragile. One wrong env var, and poof, it crumbles.

This? Game over. They’re running agents in GitHub Actions on self-hosted runners, mirroring production down to the Kafka cluster in KRaft mode. Postgres 15 humming, Redis caching, ClickHouse crunching — the works. Agent writes code, slams it through unit tests (Jest, Go), integration beasts against real DBs, even E2E Playwright runs against a live stack with NGINX proxies and blockchain minting via Foundry Anvil.

No more ‘works on my machine’ alibis. The agent lives in the real world from minute one.

Why Ditch the Laptop for CI Agents?

Laptops? They’re sandboxes for solos. Fine for brainstorming a React hook or a quick SQL tweak. But scale to prod code? Secrets leak, deps flake, tests skip the heavy lifts.

CI agents thrive here — because they must. Jira tokens? Safe in GitHub secrets. Confluence plans? Fetched live. And that environment parity? It’s not optional; it’s the floor.

Think of it like this: devs have been manual pilots forever, wrestling clouds of variables. Now, AI’s the autopilot, strapped into a cockpit that’s prod-identical. Turbulence? It just reruns the suite.

Their setup’s a beast: services firing up Postgres, Redis, ClickHouse 23.3, Kafka. Tooling like Typesense for search, NGINX for APIs. Full pyramid — units, integrations, E2Es. Agent implements, tests instantly, no excuses.

Short version? Reliability skyrockets.

And here’s my hot take, one you won’t find in their post: this echoes the Ford assembly line, 1913. Back then, craftsmen hammered cars by hand — slow, error-prone. Ford’s line? Standardized, relentless, cheap. Swap humans for agents, code for chassis, and boom — we’re at the dawn of software factories. No coffee breaks, infinite parallelism. Devs won’t code lines; they’ll orchestrate symphonies.

How Does the Agent Chain Actually Work?

It’s a relay race, not a solo sprint. GitHub Actions workflow kicks off the ‘integration’ job. Grabs the Jira ticket, appends the Confluence execution plan. Then unleashes the chain.

First up: Implement Agent. Powered by Cursor CLI’s composer-2-fast model. Fed .agent-context.md (ticket deets), plan, even prior PR comments for reruns.

Its mission? Brutal. Analyze, implement changes across frontend (apps/-webapp, libs/ui-), backend (NestJS, Go libs). Write tests — units if simple, integrations for DB touches, E2Es for user flows. Update CLAUDE.md docs per directory. Self-review via git diff against a checklist — no commit till it passes. Draft PR description. Conventional commits: feat(DPA2-1234): add user search.

Agent commits locally. Workflow takes the baton.

Then, internal review squad activates. Builds pr-context: diff.patch, files.json, description. Classifies perspectives by changed paths — Frontend for webapp tweaks, Backend/Arch for server code, Security if risky files touched, Observability for logging/monitoring, QA always for app changes, PO for the Jira link.

Seven eyes, Claude-4.6-opus-high models. Each spits JSON: perspective (e.g., “Senior Backend Engineer”), action (“changes_required” or “clean”), summary, detailed review, suggestions.

If all clean? Push the branch, open PR. Issues? Loop back — re-run implement agent with review feedback. Iterate till gold.

Wild, right? It’s like a virtual code review meeting — but agents don’t nitpick personalities; they laser on code.

Can AI Agents Nail the Full Test Pyramid?

Skeptics scoff: sure, units. But integrations? E2Es on Kafka streams?

They do. Because CI’s the great equalizer. Agent decides test scope — feasibility rules. Units for logic. Integrations hit real Postgres schemas, Redis queues, ClickHouse analytics. E2Es? Playwright puppeteering browsers against the full stack, minting blockchain test tokens.

No mocks faking it. Real deps spin up in minutes. Lint passes, tests green, coverage holds. Agent even investigates bugs — trace failures, propose fixes in the loop.

This changes everything. Devs spend 40% of time testing/debugging. Agents? Zero emotion, infinite patience. Run the pyramid 10x faster.

But — and here’s the futurist wonder — imagine scaling. Swarm 50 agents on parallel tickets. Jira board empties overnight. That’s the platform shift: AI as the new OS for engineering teams.

Why Does This Matter for Developers Right Now?

You’re not replaced; you’re promoted.

Humans orchestrate: refine plans, set policies, handle edge cases agents flag. Agents grind the boilerplate — CRUDs, test boilerplate, doc syncs.

Bold prediction: by 2026, 30% of PRs agent-authored in forward-thinking shops. GitHub Copilot? Kid stuff. This is autonomous.

Corporate spin? Their post gushes workflow details, but glosses costs — self-hosted runners ain’t free, Claude Opus bills stack. Still, ROI screams if you’re at scale.

Energy here is palpable. AI’s not assisting; it’s executing. Wonder what your backlog looks like agent-owned?

It’s coming. Fast.

🧬 Related Insights

Read more: Ingress-NGINX’s Hidden Traps: Five Behaviors That’ll Bite During Kubernetes Migration
Read more: EU Lawmakers Explode: US Tech Dialogue a Sovereignty Sellout?

Frequently Asked Questions

What are AI agents writing production code in CI?

They’re autonomous AI workflows in GitHub Actions (or similar) that take Jira tickets, implement code changes, run full tests (unit to E2E), self-review from multiple angles, and open PRs — all without human coding.

Can AI agents handle complex integrations like Kafka and Postgres?

Yes, via self-hosted CI runners with prod-like services spun up: Postgres 15, Redis, ClickHouse, Kafka in KRaft. Agents write and validate against the real stack instantly.

Will this replace developers?

No — it automates grunt work (implementation, basic tests). Humans focus on architecture, planning, and reviewing agent outputs for nuance.

AI Agents Write Production Code in CI

Key Takeaways

Why Ditch the Laptop for CI Agents?

How Does the Agent Chain Actually Work?

Can AI Agents Nail the Full Test Pyramid?

Why Does This Matter for Developers Right Now?

🧬 Related Insights

Frequently asked questions

Worth sharing?

⚡ Key Takeaways

Why Ditch the Laptop for CI Agents?

How Does the Agent Chain Actually Work?

Can AI Agents Nail the Full Test Pyramid?

Why Does This Matter for Developers Right Now?

🧬 Related Insights

Frequently asked questions

Share this article

Worth sharing?

Related Stories

x-agent-trust Hits OpenAPI: The Trust Badge Every AI Agent Needs

AgentOps: Keeping AI Agents from Botching Hospital Approvals

MCP Stars Skyrocket 80x in 6 Months: Code Your Way to AI Agent Interconnectivity

Conflux Drops: The Spec-First Orchestrator That Finally Parallelizes AI Coding Hell

Stay in the loop

Key Takeaways