Everyone expected Claude Sub-agents to be the answer to multi-step AI workflows. Specialized agents with isolated contexts, model routing, tool restrictions, background execution—on paper, it’s elegant. The pitch was simple: define your specialists, let Claude decide when to delegate, and watch it orchestrate. But there’s a problem buried deep in the architecture, one that becomes unavoidable the moment you need a repeatable pipeline instead of an interactive chat session.
The issue isn’t the agents themselves. It’s who decides what happens next.
The Problem With LLMs Making Routing Decisions
In Claude Sub-agents, the parent LLM reads your request, scans descriptions of available sub-agents, and decides which one to spawn. It chooses the prompt. It picks the sequencing. It decides when to synthesize results or spawn more agents. Every routing decision is probabilistic inference—next token prediction dressed up as orchestration.
“The LLM decides: whether to delegate at all, which sub-agent to spawn, what prompt to write for the sub-agent, when to synthesize results vs. spawn more agents.”
This works fine when you’re in the loop. You ask Claude to review code, it picks the code-reviewer agent, you get feedback, you correct it. Interactive. Exploratory. Forgiving. But scale that to a CI/CD pipeline—plan, code, test, review, deploy—and you’re asking the LLM to be a reliable router. LLMs are not reliable routers. They forget steps. They miscount iterations. They silently skip transitions. On Tuesday they nail it; on Wednesday they hallucinate an extra step or skip the entire testing phase.
This isn’t a flaw in Claude’s intelligence. It’s a fundamental mismatch between what LLMs do well (creative reasoning, synthesis, explanation) and what industrial workflows require (deterministic sequencing, exit codes, retry policies).
How We Actually Run Workflows (Hint: Not Like This)
Consider how a hospital handles surgical procedures. Nobody hands a surgeon a list of five specialists and says “figure out the order.” Instead: pre-op handles assessment, surgical team does the procedure, pathology tests samples, post-op manages recovery. The order is written down. The roles are assigned. Creativity lives within each step—the surgeon improvises based on what they find—but the overall structure is deterministic.
Software pipelines follow the same logic. Test before deploy isn’t a creative suggestion—it’s a business rule. Retry with exponential backoff isn’t a vibe—it’s a policy. If tests fail, the pipeline stops. No LLM inference involved. These decisions were made before the code ran.
Claude Sub-agents blur this distinction. Sub-agents can’t even spawn other sub-agents, so chaining requires the parent to orchestrate. But the parent’s orchestration logic is just its next token prediction. You’ve replaced a workflow engine with a language model and hoped for the best.
The Separation That Matters
Not every part of a pipeline needs the same level of determinism. Code generation? Let the LLM be creative. Code review? Reasoning freely is the entire point. Planning and breaking tasks into subtasks? Inherently creative work.
But step ordering? That’s deterministic. Retry logic? Deterministic. Quality gates? Deterministic. Error handling? Deterministic. These are orchestration concerns, and they should live in config, not in inference.
This is where duckflux enters the picture. It’s a declarative YAML-based workflow DSL that flips the mental model entirely. The execution order lives in a config file. The runtime handles sequencing, loops, parallelism, retries, events, and tracing. Each step can invoke an LLM, run a shell command, call an HTTP API, or trigger a sub-workflow. The LLM does creative work inside the step. The DSL handles the plumbing between steps.
Here’s the shift in thinking:
With Claude Sub-agents: Define agents → LLM routes → hope it chains correctly → results synthesize somehow.
With duckflux: Define workflow in YAML → runtime executes deterministically → LLM executes within each step → results flow through the pipeline as designed.
When Does This Actually Break?
A small, exploratory task? Claude Sub-agents are fine. You’re in the loop, you can correct routing mistakes, you can nudge the LLM toward the right agent. But the moment you need:
- Repeatable results — same input, same output, every time.
- Background jobs — runs happening without human supervision.
- Complex branching — “if test fails, run debug suite; if debug finds nothing, page on-call.”
- Audit trails — “which step failed and why, exactly.”
- Cost control — “use Haiku for cheap work, Opus for hard reasoning.”
…you’re in duckflux territory. The LLM stops being a router and starts being a worker. It’s a fundamentally different architecture.
The Honest Take
Claude Sub-agents are a genuinely clever design. They let you express agent behavior in natural language, and for interactive work, that’s powerful. Anthropic’s docs are clear about the trade-off: sub-agents can’t chain without parent coordination, and that parent coordination is inference-based.
The mistake is treating this as a feature. It’s a limitation, and an acknowledged one. The moment you scale beyond interactive exploration, you hit the ceiling.
duckflux doesn’t try to be clever. It’s boring in the best way. Workflows are boring. Routers should be boring. The creativity should happen inside the steps, where an LLM excels. The plumbing should be deterministic, where a workflow engine excels.
What This Means for Builders
If you’re building internal tools or one-off scripts, Claude Sub-agents are worth trying. The developer experience is smooth. But if you’re building production systems, CI/CD pipelines, or any workflow that needs to run reliably in the background—duckflux or similar deterministic orchestration layers are non-negotiable. You’re not giving up LLM power; you’re channeling it into the right part of the pipeline.
The question isn’t whether LLMs should be involved. It’s where. And the answer is: inside the steps, not routing between them.
🧬 Related Insights
- Read more: Cut: How One Developer Built a Movie Discovery App Without a Backend (And Why That’s Brilliant)
- Read more: Why Your CSS Keeps Breaking Other Screens: The DOM Boundary Problem Frontend Teams Won’t Talk About
Frequently Asked Questions
Can Claude Sub-agents handle complex multi-step pipelines? Technically yes, but the LLM has to orchestrate each transition, which introduces unreliability. For deterministic pipelines (test before deploy, retry on failure), you need a workflow engine, not inference-based routing.
Is duckflux a replacement for Claude Sub-agents? No. They solve different problems. duckflux replaces the orchestration layer; Claude Sub-agents (or any LLM) can be called as a step within duckflux. The architectures are complementary, not competing.
How do I know if I need deterministic orchestration? If you’re asking “what if the LLM forgets this step,” you need deterministic orchestration. If you’re in interactive exploration, you don’t.