Bernstein vs. Parallel Code¶

tl;dr — Parallel Code lets you manually run multiple coding agent sessions in separate windows or tmux panes. It works fine if you want to manage the parallelism yourself. Bernstein automates the coordination: task planning, model routing, verification, and merge conflict prevention. The question is whether you want to be the orchestrator or have software be the orchestrator.

Last verified: 2026-04-19. "Parallel Code" covers both the github.com/johannesjo/parallel-code desktop app (Claude Code + Codex + Gemini, git worktree per agent) and the broader manual-tmux pattern of running multiple coding agents side by side.

What each tool is¶

Parallel Code refers to both the specific desktop app at github.com/johannesjo/parallel-code (gives every agent its own git branch and worktree, one-click diff viewer, one-click merge, supports Claude Code, Codex, and Gemini) and the broader pattern of running multiple independent coding agent sessions simultaneously — typically via tmux, multiple terminal windows, or a session manager. The parallelism is human-coordinated: you decide what tasks to run, assign them to sessions, and resolve conflicts.

Bernstein automates that coordination. You provide a goal in natural language. Bernstein decomposes it into subtasks, assigns each to an agent process with the appropriate model and role, runs them in isolated git worktrees, verifies results with a janitor (tests + linter), and merges the work. The human stays out of the loop unless a task needs escalation.

Feature comparison¶

Feature	Bernstein	Parallel Code
Task planning	Automatic from natural language goal	Manual — you write and assign each task
Parallelism	Automatic — up to N concurrent agents	Manual — you open N sessions
Git isolation	Per-agent worktrees, automatic	Manual — you manage branches
Result verification	Automatic janitor (tests, linter, files)	Manual — you check output
Merge coordination	Automatic with conflict prevention	Manual — you resolve conflicts
Model routing	Cost-aware bandit (cheap models for simple tasks)	Fixed — same model per session
Failure handling	Automatic retry, quarantine	Manual — you rerun failed sessions
Cost tracking	Per-task, per-model, aggregated	None — check provider dashboard
Headless operation	Yes — `--headless` flag	No — requires human to monitor
Self-evolution	Yes — `--evolve` mode	No
Setup complexity	Low — `bernstein init`, `bernstein run`	Zero — just open more terminals

Architecture comparison¶

Parallel Code (human-coordinated):

You (the human orchestrator)
    │
    ├── Terminal 1: claude --prompt "add /users endpoint"  ──→ you review output
    ├── Terminal 2: claude --prompt "write tests for auth"  ──→ you review output
    └── Terminal 3: codex  --prompt "update API docs"       ──→ you review output
         │
         ▼
    You merge the output, resolve conflicts, check tests

You are the scheduler, the verifier, and the merge coordinator. If a session fails, you notice and restart it. If two sessions modify the same file, you resolve the conflict.

Bernstein (automated orchestration):

bernstein -g "add user management: endpoints + tests + docs"
    │
    ▼
Task planner (LLM) → 3 tasks: [endpoints, tests, docs]
    │
    ├── Task A → claude  (worktree: feat/task-a) → janitor → merge
    ├── Task B → codex   (worktree: feat/task-b) → janitor → merge
    └── Task C → gemini  (worktree: feat/task-c) → janitor → merge

You return when it's done. Or set --headless and don't return at all.

The scheduler, verifier, and merge coordinator are all software. You're not in the loop unless something escalates.

The coordination tax¶

Managing parallel agent sessions manually costs real attention:

Task assignment: Writing the right prompt for each session, ensuring no overlap
Monitoring: Watching multiple terminal outputs for completion or failure
Verification: Manually running tests after each session completes
Merge coordination: Pulling each session's output into the main branch without conflicts
Cost tracking: None — you find out at the end of the month

For 2–3 sessions on a short task, this coordination tax is acceptable. For 5+ sessions on a complex feature, it consumes most of the time savings from parallelism.

Bernstein's value proposition is eliminating the coordination tax:

Metric	Manual parallel	Bernstein
Human attention required	High (monitoring + verification)	Low (setup + review)
Verification before merge	Depends on how well you verify	Janitor-enforced (pytest + ruff + file checks)
Merge conflicts	Frequent without coordination	Worktree isolation + merge queue
Overnight operation	Not possible (needs supervision)	Yes (`--headless --budget N`)

Numeric quality claims (early pilot, n=25, +8 pp, p=0.569) live in benchmarks/README.md. Earlier copies of this page cited larger deltas that were not reproducible at n=25 and have been removed.

When manual parallel is better¶

You have 2–3 short tasks you've done before. If you know exactly what each agent should do and the tasks take under 10 minutes each, the overhead of setting up Bernstein is not worth it.
You want to watch the agent work. Parallel terminal sessions let you see the agent's reasoning in real time. Bernstein's agents write to trace files — less visible unless you're actively tailing them.
You need fine-grained control over each session. If you want to intervene mid-task, redirect the agent, or combine manual and automated work, controlling the terminals directly gives you more flexibility.
The tasks aren't independent. If each session depends on the output of the previous one, parallelism doesn't help and Bernstein's task decomposition won't either.

When Bernstein is better¶

You have 4+ tasks. Managing 4+ terminal sessions is where the coordination tax starts exceeding the time savings from parallelism. Bernstein scales linearly; human coordination doesn't.
You want verified output. Bernstein's janitor runs your tests after every task. Manual parallel requires you to remember to run tests after every session.
You want to run overnight or in CI. You can't watch 5 terminal sessions while you sleep. bernstein --headless --budget 15.00 can.
You're doing this repeatedly. One-off parallel sessions are fine manually. If you're running parallel coding agents multiple times a week, the setup investment in Bernstein pays off quickly.
You want cost data. Bernstein logs every token spent per task per model. After 10 sessions, you have data on which task types cost how much. Manual parallel gives you a monthly provider bill with no task-level breakdown.
You want model routing. Bernstein's bandit assigns cheap models (Haiku, Gemini free tier) to simple tasks and escalates complex ones. Manual parallel uses whatever model you launched each session with.

Migration path¶

Parallel Code users typically adopt Bernstein after one of these experiences:

Two sessions modified the same file and creating a messy merge conflict
A session "completed" but the tests were failing — noticed only later
Left 3 sessions running and came back to find 1 had silently errored out 2 hours ago
Tried to run 6 sessions at once and spent more time coordinating them than the tasks took

If you haven't had any of these experiences yet, your tasks might be simple enough that manual parallelism is the right tool. If any of these sound familiar, Bernstein solves exactly those problems.