Bernstein vs. Paperclip¶

tl;dr — Paperclip is an AI company simulator: org charts, budgets, governance hierarchies for AI agents. Bernstein is an engineering tool: spawn agents, ship code, verify results. They solve different problems. If you need corporate structure for your AI workforce, Paperclip is impressive. If you need to parallelize coding tasks and merge working branches, Bernstein is what you want.

Last verified: 2026-04-19. Based on github.com/paperclipai/paperclip (55k+ stars, launched 2026-03-02) and paperclip.ing.

What each tool is¶

Paperclip (55k+ stars, MIT, Node.js + React) is an "AI company management" platform, released March 2, 2026. It models AI agents as employees in an organization: they have roles, report to managers, operate within budgets, follow org-chart hierarchies, and receive scheduled heartbeats. The coordinator is an LLM. It advertises Claude Code, Codex, Cursor, "OpenClaw," plus bash and HTTP hooks ("if it can receive a heartbeat, it's hired"). Think of it as an HR and project-management control plane for AI agents.

Bernstein (Apache 2.0, Python) is a multi-agent orchestrator for CLI coding agents. It breaks a goal into tasks, spawns agents in isolated git worktrees, verifies their output (tests, lint, file checks), and merges the results. The orchestrator is deterministic Python — zero LLM tokens spent on coordination. It supports 31 CLI adapters and runs anywhere Python runs.

Feature comparison¶

Feature	Bernstein	Paperclip
Primary focus	Ship code	Manage AI organizations
Open source	Yes — Apache 2.0	Yes — MIT
Language	Python	Node.js + React
Orchestrator logic	Deterministic code (no LLM)	LLM-based coordination
Agent adapters	31 CLI adapters	Claude Code, Codex, Cursor, OpenClaw + bash / HTTP hooks
Org charts / hierarchies	No	Yes — core feature
Budget enforcement	Cost tracking + budget caps	Yes — per-agent and per-team budgets
Task ticketing	Yes — internal task server	Yes — with goal alignment
Git worktree isolation	Yes — per agent	No
Result verification	Janitor (tests, lint, files)	Governance controls
Scheduled heartbeats	Tick pipeline (deterministic)	Yes — LLM-driven
Multi-company support	No (multi-repo workspaces)	Yes
Plan files	YAML stages + steps	Goal hierarchies
Audit trail	HMAC-chained, file-based	Activity logs
Self-evolution	Yes — `--evolve` mode	No
Protocol support	MCP, A2A	Not documented
Web UI	TUI + web dashboard	Yes — React dashboard
Cluster mode	Yes	Not documented
Chat bridges (Telegram / Discord / Slack) (only Bernstein)	✓ — `bernstein chat serve --platform=...` drives runs from chat	✗
SSH remote sandbox (only Bernstein)	✓ — `bernstein remote test/run/forget <host>` with ControlMaster reuse	✗
Lifecycle hooks (pre/post task, merge, spawn) (only Bernstein)	✓ — `bernstein hooks` (shell scripts or pluggy `@hookimpl`)	✗
Auto-PR with janitor gate + cost summary (only Bernstein)	✓ — `bernstein pr`	✗
Tunnel wrapper (cloudflared / ngrok / bore / tailscale) (only Bernstein)	✓ — `bernstein tunnel start/list/stop`	✗
Interactive mid-run tool-call approval (only Bernstein)	✓ — `bernstein approve-tool` / `reject-tool`	✗
Daemon / service install (systemd / launchd) (only Bernstein)	✓ — `bernstein daemon install/start/stop/status`	✗

Architecture comparison¶

Paperclip (AI company simulator):

React dashboard
    │
    ▼
LLM coordinator (manages org structure)
    │
    ├── Team A (budget: $50/day)
    │   ├── Manager agent (Claude)
    │   ├── Worker agent (Codex)
    │   └── Worker agent (Cursor)
    │
    └── Team B (budget: $30/day)
        ├── Manager agent (Claude)
        └── Worker agent (OpenClaw)

Heartbeats, goal alignment, governance controls

The coordinator uses LLM calls to manage agent relationships, delegate work through hierarchies, and enforce organizational policies. The metaphor is a company with departments, managers, and employees.

Bernstein (engineering orchestrator):

bernstein -g "goal"  (terminal, CI, SSH)
    │
    ▼
Task server (local FastAPI, deterministic Python)
    │
    ├── Task A → claude  (isolated worktree) → janitor → merge
    ├── Task B → codex   (isolated worktree) → janitor → merge
    └── Task C → gemini  (isolated worktree) → janitor → merge

State: .sdd/ files (backlog, runtime, metrics, config)

The orchestrator is deterministic code. Agents are short-lived processes that execute one task, get verified, and exit. No hierarchies, no org charts — just a task queue and a verification step.

The fundamental difference¶

Paperclip answers: "How do I organize and govern a fleet of AI agents like a company?"

Bernstein answers: "How do I get code shipped faster using multiple agents in parallel?"

These are genuinely different problems. Paperclip cares about organizational structure — who reports to whom, what budget each team has, how goals cascade through a hierarchy. Bernstein cares about engineering output — did the tests pass, did the linter pass, can this branch merge cleanly.

Where Paperclip is better¶

Organizational modeling. If you're running dozens of AI agents across multiple projects with different budgets, teams, and governance requirements, Paperclip's org-chart model gives you structure that Bernstein doesn't attempt. Bernstein has no concept of "teams" or "reporting lines."

Web UI. Paperclip ships a React dashboard. Bernstein ships both a TUI (bernstein live) and a web dashboard (bernstein dashboard). Paperclip's React UI is more polished for non-technical stakeholders; Bernstein's dashboard is developer-oriented (logs, traces, cost).

Community size. 55k+ stars means a large ecosystem of contributors, integrations, and community support. More eyes on bugs, more plugins, more documentation.

Multi-company support. If you're managing AI agents across multiple organizations (consultancy, agency, MSP use cases), Paperclip has first-class support. Bernstein's multi-repo workspaces are not the same thing.

Budget governance. Paperclip's budget enforcement is hierarchical — team budgets, per-agent limits, approval workflows. Bernstein tracks costs and has a global budget cap, but doesn't model organizational budget hierarchies.

Where Bernstein is better¶

Actually shipping code. Bernstein's entire pipeline is optimized for one thing: take a goal, break it into tasks, execute them in parallel, verify the results, merge working code. Git worktree isolation, janitor verification (tests + lint + file checks), and deterministic merge ordering exist because the goal is a working codebase, not an org chart.

Zero LLM overhead on coordination. Paperclip uses LLM calls for coordination — managing hierarchies, routing tasks through org structures, heartbeat processing. Every coordination decision costs tokens. Bernstein's orchestrator is ~800 lines of deterministic Python. Coordination cost is zero.

Agent breadth. 31 adapters vs. 4 official. If you use Gemini, OpenAI Agents SDK v2, Aider, Amp, Kilo, Kiro, Qwen, Goose, or OpenCode as first-class adapters rather than bash/HTTP shims, Bernstein supports them out of the box.

Git-native isolation. Each Bernstein agent works in its own git worktree. Conflicts are detected at merge time, not at runtime. Paperclip doesn't provide git-level isolation.

Verification before merge. The janitor runs your test suite, linter, and file-existence checks before any agent's work is merged. This is not governance — it's engineering verification. Paperclip's governance controls are organizational (budget, hierarchy), not technical (tests pass).

No web UI to maintain. This is a tradeoff, not a pure win. But for engineers working in terminals, SSH sessions, and CI pipelines, a CLI tool that doesn't require a React app is simpler to deploy and operate.

Self-evolution. bernstein --evolve analyzes past run metrics and improves prompts, routing, and templates. Paperclip doesn't have an equivalent.

When to use Paperclip¶

You're managing AI agents as a business function. Multiple teams, budgets, approval chains, governance requirements. The org-chart metaphor maps to how your organization thinks about AI agent deployment.
You want a visual dashboard. Non-technical stakeholders need to see what agents are doing, what they're costing, and how they're organized.
You need multi-company support. You're a consultancy or MSP deploying AI agents across client organizations.
The problem is coordination, not code. Your agents do diverse work (not just coding), and you need organizational structure around them.

When to use Bernstein¶

You want to ship code. Your goal is "implement these 10 features in parallel and merge them all into a working branch." Bernstein does this. Paperclip doesn't try to.
You want zero coordination overhead. No LLM tokens spent on figuring out which agent should do what. Deterministic task assignment, deterministic verification.
You use diverse CLI agents. 31 adapters vs. 4 official. Mix Claude, Codex, OpenAI Agents SDK v2, Gemini, and Aider in the same session without dropping to bash/HTTP shims.
You want git-native safety. Worktree isolation, conflict detection, janitor verification. The output is a tested, linted branch.
You work in terminals. CLI-native, works over SSH, runs in CI, no browser required.
You don't need org charts. If the concept of "reporting lines for AI agents" doesn't map to your problem, you don't need Paperclip's primary feature.

The complementary case¶

These tools could coexist. Paperclip could manage the organizational layer — which teams exist, what budgets they have, what governance policies apply — while Bernstein handles the engineering execution within each team. Paperclip decides "Team Backend gets $200/day and works on the API refactor." Bernstein takes that goal, spawns 5 agents in isolated worktrees, verifies their output, and merges working code.

This isn't a theoretical integration — it's a recognition that "manage AI agents as a company" and "ship code with parallel agents" are different layers of the same stack.