Bernstein: LLM-Free Multi-Agent Orchestrator

Multiple LLMs promised coding bliss. They delivered hellish file overwrites and broken tests. Enter Bernstein: a no-nonsense Python orchestrator that actually works.

Bernstein pipeline diagram: LLM decompose to task graph, parallel agents in worktrees, janitor verify, git merge

Key Takeaways

  • Ditch LLM schedulers for deterministic Python—saves tokens, sanity.
  • Git worktrees + file ownership = conflict-free parallel agents.
  • LinUCB bandit auto-tunes models per task, slashing costs 23%.

What if the real bottleneck in your AI dev workflow isn’t the models—it’s letting them schedule themselves?

I’ve chased every shiny LLM agent hype cycle for years, from Claude’s code wizardry to Gemini’s CLI tricks. Last year? Ten tickets, one week, three agents ripping apart the same auth.py like jealous exes. Chaos. Pure, expensive chaos.

The author nails it here: “Agent A edits auth.py. Agent B edits auth.py. Agent A’s changes get silently overwritten.” That’s not collaboration; that’s a war zone. And yeah, I tried the ‘manager LLM’ fix. Hallucinated assignments. Token bleed on chit-chat. Waste.

But here’s the kick—scheduling ain’t rocket science. Or neural nets. OSes cracked it decades ago. Why reinvent with LLMs?

Why Did LLMs Fail at Herding LLMs?

Look, LLMs shine at creativity, pattern-matching wild ideas. Not rote task juggling. The original setup? An LLM ‘coordinator’ reading backlogs, assigning roles, babysitting fails. Sounds smart. Costs a fortune in tokens—40% overhead, per the post. Forgets priorities. Re-plans into loops. It’s like hiring a poet to run a factory.

Rip it out. Replace with Python determinism. Boom: Bernstein. Open-source multi-agent orchestrator for CLI coders. Zero LLM tokens on scheduling. Just works.

The pipeline? Elegant, brutal simplicity.

Decompose: One LLM hit—your goal becomes a task graph. Roles, files owned, deps, done signals.

Spawn: Fresh agent per task, isolated git worktree. Parallel madness, main branch safe.

Verify: Janitor bot checks reality—tests green? Linter quiet? Types solid? No ‘vibes-based’ merges.

Merge: Winners land. Losers retry or reroute.

That’s it. Event loop polls tasks, matches agents, lifecycles ‘em. Run twice? Same outcome. Auditable. Reproducible. No AI roulette.

And the genius hack? Git worktrees.

Each agent gets its own playground: git worktree add .sdd/worktrees/session-abc123 -b agent/session-abc123. They own the repo illusion. No stomps, no locks. Janitor verifies, merges clean—‘cause orchestrator enforces file ownership. No overlaps in parallel.

Symlinks heavy dirs like node_modules from main. No redo cost per agent. Smart.

How Does Bernstein Pick the Right Model Without Hallucinating?

Not every rename needs Opus-level brainpower. Static rules flop—tasks blur. Enter LinUCB contextual bandit. Learns from history.

Task context: complexity tier, scope, role (backend? security?), token guess. Arms: haiku, sonnet, opus.

Reward? Quality times (1 - cost norm). Cheap wins if it passes janitor.

Cold start? Cascade: haiku simple, opus hairy roles. Warms up, bandit rules. Policy saves in JSON—accumulates smarts.

Cuts costs 23%. Most tasks? Boilerplate fodder for cheapies.

“In practice, this cuts costs by roughly 23% compared to using the same model for everything, because most tasks are boilerplate that cheap models handle fine.”

Damn right.

I’ve seen this movie before. Early ’90s, everyone hyped ‘smart’ process schedulers with fuzzy logic. Unix? Stuck to deterministic queues. Won. Because factories don’t need intuition—they need reliability. Bernstein’s that Unix for agents. Skeptical vet insight: We’re fooling ourselves thinking LLMs scale coordination. They don’t. This bandit twist? Predicts the next wave: hybrid brains where Python owns the rails, models ride ‘em.

Bernstein vs. the Hype Machines

CrewAI, AutoGen, LangGraph—they’re LLM-heavy. Flashy graphs, agent chats. Token vampires.

Bernstein? Lean Python. CLI-native. Git-powered isolation.

Completing that table mentally:

Feature Bernstein CrewAI AutoGen LangGraph
Scheduling Deterministic Python LLM-driven LLM-heavy Graph/LLM
Isolation Git worktrees Shared state Shared Varies
Cost Bandit-optimized High High High
Verify Hard signals Soft Soft Soft

It’s not even close for production crunch.

But cynicism check: Open-source? Great. Will it stick? Maintainer burnout’s real. Still, CLI focus screams dev-first—not VC demo.

Who profits? Not Anthropic on scheduler tokens. You—faster cycles, lower bills. Silicon Valley’s agent gold rush? This reins it in.

Picture scaling to 50 agents. Worktrees stack cheap. Bandit tunes fleet-wide. Deadlines shrink.

One gripe: Setup’s git-savvy. Noobs stumble. But that’s a feature—filters hype-chasers.

Why Does This Matter for Real Dev Teams?

Agents solo? Fun toy. Teams? Multi-agent hell without rails. Bernstein’s your foreman.

Bold call: By 2026, every serious AI dev shop runs something like this. LLM schedulers fade like rule-based expert systems did.

It’s the quiet revolution. No buzz. Just code that ships.


🧬 Related Insights

Frequently Asked Questions

What is Bernstein orchestrator? Bernstein’s an open-source Python tool coordinating CLI AI coding agents via git worktrees, deterministic scheduling, and a bandit model router—no LLM tokens wasted on coordination.

How does Bernstein handle agent conflicts? Isolates each agent in its own git worktree, enforces file ownership to prevent overlaps, verifies with hard checks (tests, linter), then merges clean.

Is Bernstein cheaper than CrewAI or AutoGen? Yes—23% cost cut via bandit routing to cheap models for simple tasks, plus zero scheduling tokens versus their LLM-heavy approaches.

Aisha Patel
Written by

Former ML engineer turned writer. Covers computer vision and robotics with a practitioner perspective.

Frequently asked questions

What is Bernstein orchestrator?
Bernstein's an open-source Python tool coordinating CLI <a href="/tag/ai-coding-agents/">AI coding agents</a> via git worktrees, deterministic scheduling, and a bandit model router—no LLM tokens wasted on coordination.
How does Bernstein handle agent conflicts?
Isolates each agent in its own git worktree, enforces file ownership to prevent overlaps, verifies with hard checks (tests, linter), then merges clean.
Is Bernstein cheaper than CrewAI or AutoGen?
Yes—23% cost cut via bandit routing to cheap models for simple tasks, plus zero scheduling tokens versus their LLM-heavy approaches.

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by dev.to

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.