Claude Sub-agents vs duckflux: Deterministic Workflows

Everyone expected Claude Sub-agents to be the answer to multi-step AI workflows. Specialized agents with isolated contexts, model routing, tool restrictions, background execution—on paper, it’s elegant. The pitch was simple: define your specialists, let Claude decide when to delegate, and watch it orchestrate. But there’s a problem buried deep in the architecture, one that becomes unavoidable the moment you need a repeatable pipeline instead of an interactive chat session.

The issue isn’t the agents themselves. It’s who decides what happens next.

The Problem With LLMs Making Routing Decisions

In Claude Sub-agents, the parent LLM reads your request, scans descriptions of available sub-agents, and decides which one to spawn. It chooses the prompt. It picks the sequencing. It decides when to synthesize results or spawn more agents. Every routing decision is probabilistic inference—next token prediction dressed up as orchestration.

“The LLM decides: whether to delegate at all, which sub-agent to spawn, what prompt to write for the sub-agent, when to synthesize results vs. spawn more agents.”

This works fine when you’re in the loop. You ask Claude to review code, it picks the code-reviewer agent, you get feedback, you correct it. Interactive. Exploratory. Forgiving. But scale that to a CI/CD pipeline—plan, code, test, review, deploy—and you’re asking the LLM to be a reliable router. LLMs are not reliable routers. They forget steps. They miscount iterations. They silently skip transitions. On Tuesday they nail it; on Wednesday they hallucinate an extra step or skip the entire testing phase.

This isn’t a flaw in Claude’s intelligence. It’s a fundamental mismatch between what LLMs do well (creative reasoning, synthesis, explanation) and what industrial workflows require (deterministic sequencing, exit codes, retry policies).

How We Actually Run Workflows (Hint: Not Like This)

Consider how a hospital handles surgical procedures. Nobody hands a surgeon a list of five specialists and says “figure out the order.” Instead: pre-op handles assessment, surgical team does the procedure, pathology tests samples, post-op manages recovery. The order is written down. The roles are assigned. Creativity lives within each step—the surgeon improvises based on what they find—but the overall structure is deterministic.

Software pipelines follow the same logic. Test before deploy isn’t a creative suggestion—it’s a business rule. Retry with exponential backoff isn’t a vibe—it’s a policy. If tests fail, the pipeline stops. No LLM inference involved. These decisions were made before the code ran.

Claude Sub-agents blur this distinction. Sub-agents can’t even spawn other sub-agents, so chaining requires the parent to orchestrate. But the parent’s orchestration logic is just its next token prediction. You’ve replaced a workflow engine with a language model and hoped for the best.

The Separation That Matters

Not every part of a pipeline needs the same level of determinism. Code generation? Let the LLM be creative. Code review? Reasoning freely is the entire point. Planning and breaking tasks into subtasks? Inherently creative work.

But step ordering? That’s deterministic. Retry logic? Deterministic. Quality gates? Deterministic. Error handling? Deterministic. These are orchestration concerns, and they should live in config, not in inference.

This is where duckflux enters the picture. It’s a declarative YAML-based workflow DSL that flips the mental model entirely. The execution order lives in a config file. The runtime handles sequencing, loops, parallelism, retries, events, and tracing. Each step can invoke an LLM, run a shell command, call an HTTP API, or trigger a sub-workflow. The LLM does creative work inside the step. The DSL handles the plumbing between steps.

Here’s the shift in thinking:

With Claude Sub-agents: Define agents → LLM routes → hope it chains correctly → results synthesize somehow.

With duckflux: Define workflow in YAML → runtime executes deterministically → LLM executes within each step → results flow through the pipeline as designed.

When Does This Actually Break?

A small, exploratory task? Claude Sub-agents are fine. You’re in the loop, you can correct routing mistakes, you can nudge the LLM toward the right agent. But the moment you need:

Repeatable results — same input, same output, every time.
Background jobs — runs happening without human supervision.
Complex branching — “if test fails, run debug suite; if debug finds nothing, page on-call.”
Audit trails — “which step failed and why, exactly.”
Cost control — “use Haiku for cheap work, Opus for hard reasoning.”

…you’re in duckflux territory. The LLM stops being a router and starts being a worker. It’s a fundamentally different architecture.

The Honest Take

Claude Sub-agents are a genuinely clever design. They let you express agent behavior in natural language, and for interactive work, that’s powerful. Anthropic’s docs are clear about the trade-off: sub-agents can’t chain without parent coordination, and that parent coordination is inference-based.

The mistake is treating this as a feature. It’s a limitation, and an acknowledged one. The moment you scale beyond interactive exploration, you hit the ceiling.

duckflux doesn’t try to be clever. It’s boring in the best way. Workflows are boring. Routers should be boring. The creativity should happen inside the steps, where an LLM excels. The plumbing should be deterministic, where a workflow engine excels.

What This Means for Builders

If you’re building internal tools or one-off scripts, Claude Sub-agents are worth trying. The developer experience is smooth. But if you’re building production systems, CI/CD pipelines, or any workflow that needs to run reliably in the background—duckflux or similar deterministic orchestration layers are non-negotiable. You’re not giving up LLM power; you’re channeling it into the right part of the pipeline.

The question isn’t whether LLMs should be involved. It’s where. And the answer is: inside the steps, not routing between them.

🧬 Related Insights

Frequently Asked Questions

Can Claude Sub-agents handle complex multi-step pipelines? Technically yes, but the LLM has to orchestrate each transition, which introduces unreliability. For deterministic pipelines (test before deploy, retry on failure), you need a workflow engine, not inference-based routing.

Is duckflux a replacement for Claude Sub-agents? No. They solve different problems. duckflux replaces the orchestration layer; Claude Sub-agents (or any LLM) can be called as a step within duckflux. The architectures are complementary, not competing.

How do I know if I need deterministic orchestration? If you’re asking “what if the LLM forgets this step,” you need deterministic orchestration. If you’re in interactive exploration, you don’t.

Claude Sub-agents vs duckflux: Deterministic Workflows

Key Takeaways

The Problem With LLMs Making Routing Decisions

How We Actually Run Workflows (Hint: Not Like This)

The Separation That Matters

When Does This Actually Break?

The Honest Take

What This Means for Builders

🧬 Related Insights

Frequently asked questions

Worth sharing?

⚡ Key Takeaways

The Problem With LLMs Making Routing Decisions

How We Actually Run Workflows (Hint: Not Like This)

The Separation That Matters

When Does This Actually Break?

The Honest Take

What This Means for Builders

🧬 Related Insights

Frequently asked questions

Share this article

Worth sharing?

Related Stories

OpenAI's Bold Bet: Shielding AI from Catastrophic Liability in Illinois

Snowflake Cortex and dbt: The AI Duo Slaying Data Governance Drudgery

CuerdOS: Debian's Sane Speed Demon Emerges

Safetensors Moves to PyTorch Foundation: Securing ML's Wild West

Stay in the loop

Key Takeaways