Skip to content

Bernstein vs. Dorothy

tl;dr — Dorothy is a free desktop app that orchestrates Claude Code, Codex, Gemini, and local agents with a Kanban board, a "Super Agent" delegation layer, and Telegram/Slack controls. Bernstein is a headless orchestrator for CLI coding agents that runs in a terminal or CI and stores all state in files. Dorothy is better when you want a GUI to watch and delegate across a few agents. Bernstein is better when you want unattended, budget-capped, file-state runs across 42 adapters.

Last verified: 2026-04-19. Based on the Dorothy public site and repo (github.com/Charlie85270/Dorothy).


What each tool is

Dorothy is a desktop application that presents AI coding agents through a visual Kanban interface. It can launch and monitor Claude Code, Codex, Gemini, and local agents, delegate between them through a "Super Agent" that talks to them via MCP, schedule recurring work with cron, and trigger on GitHub issues/PRs. It integrates Google Workspace as an MCP server and has Telegram/Slack bridges for remote control.

Bernstein is a task dispatch orchestrator for CLI coding agents. It decomposes a goal into tasks, assigns each task to a short-lived CLI agent (across 42 adapters: Claude Code, Codex, OpenAI Agents SDK v2, Gemini CLI, Cursor, Aider, Amp, GitHub Copilot, Droid, Crush, etc.), verifies the result against external criteria (tests, linter), and merges the output. The orchestrator is deterministic Python — no LLM makes scheduling decisions. No GUI.

The core difference: Dorothy gives you a visual control plane. Bernstein gives you a headless, reproducible, file-state control plane.


Feature comparison

Feature Bernstein Dorothy
Interface CLI + TUI + JSON status endpoint Desktop app (Kanban, dashboard)
Agent coverage 42 CLI adapters Claude Code, Codex, Gemini, local
Scheduler Deterministic Python, no LLM "Super Agent" (LLM) via MCP
Verification Janitor: tests, linter, file checks None built-in
Parallel execution Yes — independent tasks run concurrently Yes — up to ~10 agents
Git worktree isolation Yes — per agent No
State File-based (.sdd/, survives crashes) Application state
Remote control CLI, SSH, REST Telegram, Slack
Trigger sources Manual, CI, cron via your own runner Built-in cron, GitHub issues/PRs
Self-evolution Yes — --evolve mode No
Model routing Cost-aware bandit across providers Per-agent
Headless / overnight Yes — --headless + budget cap Via Telegram/Slack, app must be running
Open source Apache 2.0 MIT
Chat bridges (Telegram / Discord / Slack) (only Bernstein as a first-class CLI bridge) ✓ — bernstein chat serve --platform=telegram\|discord\|slack with /run, /approve, /reject, /switch, /stop ~ — Telegram/Slack remote-control, desktop app must be running
SSH remote sandbox (only Bernstein) ✓ — bernstein remote test/run/forget <host>, ControlMaster reuse
Lifecycle hooks (pre/post task, merge, spawn) (only Bernstein) ✓ — bernstein hooks with shell scripts or pluggy @hookimpl
Auto-PR with janitor gate + cost summary (only Bernstein) ✓ — bernstein pr
Tunnel wrapper (cloudflared / ngrok / bore / tailscale) (only Bernstein) ✓ — bernstein tunnel start/list/stop
Interactive mid-run tool-call approval (only Bernstein) ✓ — bernstein approve-tool / reject-tool (--latest, --id, --always) ~ — Super Agent asks in-app
Daemon / service install (systemd / launchd) (only Bernstein) ✓ — bernstein daemon install/start/stop/status
Primary use case Unattended parallel coding with verification Visual orchestration of a small agent fleet

Architecture comparison

Dorothy (desktop + Super Agent):

Desktop app (Kanban, dashboard, logs)
    |
    v
Super Agent (LLM) -- talks to agents via MCP
    |
    +-- Claude Code         (one project, one window)
    +-- Codex               (one project, one window)
    +-- Gemini              (one project, one window)
    +-- local agent         (one project, one window)

Triggers: cron, GitHub issue/PR webhook, Telegram/Slack command

Dorothy's value is visual delegation: you watch the Kanban board, approve tasks, and let the Super Agent route work to the right agent. The app must run for agents to execute.

Bernstein (headless task dispatch):

bernstein -g "goal"  (terminal)
    |
    v
Task server (deterministic Python, .sdd/ files)
    |
    +-- Task A -> claude (isolated worktree) -> janitor -> merge
    +-- Task B -> codex  (isolated worktree) -> janitor -> merge
    +-- Task C -> gemini (isolated worktree) -> janitor -> merge

Verification: pytest + ruff + file checks before merge

Bernstein's value is unattended operation: bernstein --headless --budget 20 works the backlog until empty or budget hit. No GUI, no app to keep open.


When GUI + delegation beats headless dispatch

  • You're actively watching. A Kanban board shows what's running and what's stuck. The TUI shows the same, but Dorothy's GUI is more discoverable.
  • You want to approve individual tasks. Dorothy's Super Agent asks; Bernstein assumes you encoded acceptance in the plan file and the janitor.
  • You already run Telegram/Slack for team coordination. Dorothy's bridges slot in.
  • Your agents do diverse work, not just coding. Dorothy's Google Workspace MCP means an agent can read email, update a doc, book a meeting. Bernstein's janitor expects "code changes + tests pass."

When headless dispatch beats GUI + delegation

  • The run must complete without anyone watching. Overnight, weekend, CI, remote server. Bernstein's --headless --budget runs until done or broke. Dorothy wants its app running and an approving hand on Telegram.
  • You want file-state you can check into git. .sdd/ is text. You can diff it, grep it, revive it after a crash. Dorothy's state is in the app.
  • Verification is non-negotiable. Bernstein won't merge unless the janitor's signals pass. Dorothy leaves that to the agent and the user.
  • You need 42 adapters, not 4. Cursor, Aider, Amp, Kilo, Kiro, Goose, OpenCode, Qwen, Cody, Continue.dev, Ollama, IAC, OpenAI Agents SDK v2, Cloudflare Agents, GitHub Copilot, Droid, Hermes, Auggie, Kimi, Rovo, Cline, Codebuff, Pi, Mistral, Autohand, Forge, Plandex, OpenHands, OpenInterpreter, AIChat, GPTMe, Charm, Composio, Letta Code, Ralphex, generic — Bernstein wraps them all. Dorothy currently advertises Claude Code, Codex, Gemini, and local.

When to use Dorothy instead

  • You want a dashboard. Kanban view, per-agent status, visible logs.
  • You want to delegate through a Super Agent. LLM-routed work across a small fleet.
  • You live in Telegram/Slack. Remote-trigger agents from chat.
  • Your work crosses Google Workspace. Email, docs, calendar, not just code.

When to use Bernstein instead

  • The task decomposes into parallel independent subtasks. REST endpoints + tests + docs can all happen simultaneously in isolated worktrees.
  • You need external verification. Tests either pass or fail — agent consensus is irrelevant. Bernstein's janitor enforces this.
  • You want 42-adapter coverage. Bernstein wraps Claude Code, Codex, Gemini CLI, Cursor, Aider, Amp, Kilo, Kiro, Goose, OpenCode, Qwen, Cody, Continue.dev, Ollama, IAC, and a generic adapter — among 42 total.
  • You want cost-aware model routing. Bernstein's bandit router assigns cheap models to simple tasks and escalates complexity.
  • You want headless, overnight operation. bernstein --headless --budget 20.00 runs until the backlog is empty or the budget runs out, retrying failures automatically.
  • You want a checkable audit trail. .sdd/ files, HMAC-chained logs, per-task cost + quality metrics.

See also