Deterministic vs. LLM-Based Orchestration¶

A side-by-side comparison of two approaches to multi-agent coordination, with architecture diagrams showing the structural differences.

The two approaches¶

LLM-based orchestration¶

An LLM (the "manager agent") sits in the control plane. It reads status from running agents, reasons about priorities, assigns tasks, and decides when work is complete. This is the model used by CrewAI's hierarchical process, AutoGen's GroupChat, and LangGraph's supervisor node.

graph TD
    Goal["User Goal\n(vague or structured)"]
    Manager["Manager LLM\n(reads agent status, assigns tasks,\ndecides priorities)"]
    A1["Agent 1\n(persistent session)"]
    A2["Agent 2\n(persistent session)"]
    A3["Agent 3\n(persistent session)"]
    Bus["Message Bus / Bulletin\n(agents signal status, hunger,\nblockers to manager)"]
    Done["Done?\n(manager decides)"]

    Goal --> Manager
    Manager -->|"assign tasks"| A1 & A2 & A3
    A1 & A2 & A3 -->|"status updates,\nhunger signals"| Bus
    Bus --> Manager
    Manager --> Done

    classDef llm fill:#f9c74f,stroke:#f3722c,color:#000
    classDef agent fill:#90be6d,stroke:#43aa8b,color:#000
    classDef problem fill:#f94144,stroke:#c1121f,color:#fff
    class Manager llm
    class A1,A2,A3 agent

What goes wrong at scale:

graph TD
    PA["PAPA (Manager LLM)\nfalls asleep →\nall queues drain"]
    MUFFY["MUFFY\n283 BULLETIN messages\n0 code commits\n138 STARVING/DYING/FEED ME"]
    SMARTY["SMARTY\n2 real commits / 40 claimed\nconfused its own work\nwith others' after hours of drift"]
    Phantom["5 phantom agents\n(pido, phoenix, penny, victor, goku)\n200+ noise messages\n0 useful output)"]
    Fail["Result: 737 tasks / 47 hours\n~3 of 12 agents did real work"]

    PA --> MUFFY & SMARTY & Phantom --> Fail

    classDef bad fill:#f94144,stroke:#c1121f,color:#fff
    class PA,MUFFY,SMARTY,Phantom,Fail bad

Deterministic orchestration (Bernstein)¶

A Python process owns the control plane. It is a scheduler — it applies rules mechanically, makes no LLM calls, and has no concept of "reasoning." Agents are spawned with pre-assigned tasks, execute them, and exit. There is no idle state.

graph TD
    Goal["User Goal\nor plans/task.yaml"]
    Manager["Manager LLM\n(optional — only for\ngoal decomposition)"]
    TaskServer["Task Server\nREST API :8052\nFile-backed state in .sdd/"]
    Orch["Orchestrator\n(deterministic Python)\nno LLM calls"]
    TP["Tick Pipeline\nfetch · batch · prioritize"]
    TL["Task Lifecycle FSM\nOPEN → CLAIMED → IN_PROGRESS\n→ DONE → CLOSED"]
    AL["Agent Lifecycle\nheartbeat · crash detection · reap"]
    Spawner["Spawner\nbuild prompt · select adapter\nlaunch in git worktree"]
    WT1["Agent A\ngit worktree\n1-3 tasks → exit"]
    WT2["Agent B\ngit worktree\n1-3 tasks → exit"]
    WT3["Agent C\ngit worktree\n1-3 tasks → exit"]
    QG["Quality Gates\nlint · typecheck · tests · PII"]
    Janitor["Janitor\nverify concrete signals\n(file exists, tests pass,\nnot agent claims)"]
    Git["Git\ncommit / PR / merge"]

    Goal -->|"with plan file:\nskip LLM"| TaskServer
    Goal -->|"without plan file:\none-time call"| Manager --> TaskServer
    TaskServer --> Orch
    Orch --> TP & TL & AL
    TP & TL & AL --> Spawner
    Spawner --> WT1 & WT2 & WT3
    WT1 & WT2 & WT3 --> QG --> Janitor --> Git

    classDef det fill:#4361ee,stroke:#3a0ca3,color:#fff
    classDef agent fill:#90be6d,stroke:#43aa8b,color:#000
    classDef verify fill:#7b2d8b,stroke:#560bad,color:#fff
    classDef optllm fill:#f9c74f,stroke:#f3722c,color:#000
    class Orch,TP,TL,AL,Spawner,TaskServer det
    class WT1,WT2,WT3 agent
    class QG,Janitor,Git verify
    class Manager optllm

Task state machine (deterministic FSM)¶

The task lifecycle is a state machine implemented in Python. Every transition has a defined trigger. There are no judgment calls.

stateDiagram-v2
    [*] --> PLANNED: plan loaded
    PLANNED --> OPEN: stage deps satisfied
    OPEN --> CLAIMED: agent calls claim_next()
    CLAIMED --> IN_PROGRESS: agent begins execution
    IN_PROGRESS --> DONE: agent reports completion
    DONE --> CLOSED: janitor verification passed + merged
    IN_PROGRESS --> ORPHANED: heartbeat timeout / crash
    ORPHANED --> OPEN: retry (max_retries default=3)
    CLAIMED --> OPEN: claim expired (agent died before starting)
    IN_PROGRESS --> FAILED: max retries exceeded
    OPEN --> BLOCKED: blocking dependency added
    BLOCKED --> OPEN: blocker resolved
    OPEN --> CANCELLED: explicit cancellation
    DONE --> OPEN: janitor rejected (signals failed)
    IN_PROGRESS --> WAITING_FOR_SUBTASKS: auto-decomposed
    WAITING_FOR_SUBTASKS --> IN_PROGRESS: subtasks CLOSED

Agent lifecycle (short-lived by design)¶

Agents are born with work and die when done. There is no idle state, no hunger state, no polling loop.

sequenceDiagram
    participant O as Orchestrator
    participant S as Spawner
    participant A as Agent (git worktree)
    participant J as Janitor
    participant G as Git

    O->>O: fetch open tasks, group by role
    O->>S: spawn(tasks=[T1, T2, T3], role=backend)
    S->>A: build prompt + inject tasks + create worktree
    activate A
    A->>A: execute T1
    A->>A: execute T2
    A->>A: execute T3
    A->>O: POST /tasks/T1/complete
    A->>O: POST /tasks/T2/complete
    A->>O: POST /tasks/T3/complete
    deactivate A
    Note over A: agent exits — no idle state

    O->>J: verify T1, T2, T3
    J->>J: check concrete signals<br/>(file exists, tests pass, lint clean)
    J->>G: merge worktree to main
    J->>O: tasks CLOSED

    O->>O: next tick — spawn fresh agent for next batch

Structural comparison¶

Property	LLM-based	Deterministic (Bernstein)
Control plane	LLM agent (manager/PAPA)	Python scheduler (no LLM calls)
Scheduling decisions	LLM reasoning on each tick	Code: priority queue + role grouping
Token cost of coordination	Thousands/tick (manager context)	Zero
Agent lifetime	Persistent session (indefinite)	Bounded: 1-3 tasks then exit
Idle state	Yes — agents poll, spam hunger signals	No — agents are dead when not working
Sleep failure mode	Critical — unrecoverable without human	Impossible — dead agents cannot sleep
Context drift	Yes — long sessions accumulate stale context	No — fresh context on every spawn
Scheduling reproducibility	Non-deterministic (LLM sampling)	Deterministic — same input = same decision
Debuggability	Cannot reproduce scheduling bugs	Unit-testable state machine
Verification	Agent claims ("I finished X")	Concrete signals (file exists, tests pass)
Manager failure mode	Cascading — all agents starve	N/A — no manager
Scalability	O(agents × tasks) in manager context	O(1) per agent per tick
Coordination auditability	LLM response logs (non-reproducible)	Python call stack (fully traceable)

Token cost model¶

LLM-based orchestration¶

Per scheduling tick:
  Manager context  ≈ 5,000–20,000 tokens (grows with task count and agent count)
  Manager response ≈ 500–2,000 tokens

Per agent per idle hour:
  Hunger polling   ≈ 500–5,000 tokens (spinning, spam, status checks)
  MUFFY example:     ~50,000 tokens on hunger signaling, 0 code commits

For 737 tasks over 47 hours, 12 agents:
  Coordination overhead: tens of millions of tokens
  Useful work ratio:      ~3/12 agents = 25%

Deterministic orchestration¶

Per scheduling tick:
  Orchestrator CPU ≈ <1ms, 0 tokens

Per agent spawn:
  System prompt    ≈ 1,500–3,000 tokens (one-time per batch)
  Amortized (3 tasks/batch) ≈ 500–1,000 tokens per task

For 737 tasks over 47 hours, 12 agents:
  Coordination overhead: 0 tokens
  Useful work ratio:      ~100% (agents only run when they have work)

When each approach makes sense¶

Use LLM-based orchestration when:¶

Task count is small (< ~20) and agent count is small (< 4)
Tasks are loosely structured and require creative routing decisions
You want a demo that explains itself in plain language
The orchestration logic itself is your product (you're selling the reasoning)

Use deterministic orchestration when:¶

Running more than a handful of parallel agents
Task definitions are concrete (clear acceptance criteria)
You need predictable cost (budget limits, production use)
You need the system to run unattended for hours without human babysitting
Debugging and auditability matter
You're using agents as workers, not collaborators

ADR-001: Agent Lifecycle Model — Full analysis of hunger vs. pull vs. short-lived models with rag_challenge data
ADR-006: No Embedded LLM in the Orchestrator — Why the control plane is deterministic code
Why Deterministic Orchestration — Narrative explainer with first-principles reasoning
Architecture — Full system diagram and module breakdown

Deterministic vs. LLM-Based Orchestration¶

The two approaches¶

LLM-based orchestration¶

Deterministic orchestration (Bernstein)¶

Task state machine (deterministic FSM)¶

Agent lifecycle (short-lived by design)¶

Structural comparison¶

Token cost model¶

LLM-based orchestration¶

Deterministic orchestration¶

When each approach makes sense¶

Use LLM-based orchestration when:¶

Use deterministic orchestration when:¶

Related documents¶