Skip to content

Bernstein Architecture

Overview

Bernstein is a deterministic orchestrator for CLI coding agents. Think Kubernetes for containers, but for AI coding agents: you declare what you want, the control plane schedules it, short-lived agents execute in isolated worktrees, and a janitor verifies the output before anything lands.

The orchestrator is deterministic Python — zero LLM tokens on coordination. Every scheduling decision, every retry, every spawn is auditable code, not a model response.


System diagram

graph TD
    User["User / bernstein.yaml\n(goal, tasks, plan)"]
    TaskServer["Task Server (FastAPI)\nREST API on :8052\n/tasks, /status, /metrics"]
    Orch["Orchestrator\n(tick loop, deterministic)"]
    TP["Tick Pipeline\n(fetch, batch)"]
    TL["Task Lifecycle\n(claim, spawn, complete)"]
    AL["Agent Lifecycle\n(heartbeat, crash, reap)"]
    Spawner["Spawner\n(build prompt, select adapter,\nlaunch in git worktree)"]
    Claude["Claude Adapter\n(and 17 more)"]
    Codex["Codex Adapter"]
    Gemini["Gemini Adapter"]
    WT1["git worktree\n+ role prompt"]
    WT2["git worktree\n+ role prompt"]
    WT3["git worktree\n+ role prompt"]
    QG["Quality Gates\n(lint, type-check, tests, PII)"]
    Janitor["Janitor + Reviewer\n(verify signals, LLM review,\nmark done/fail)"]
    Git["Git (commit / PR / merge)"]

    User --> TaskServer --> Orch
    Orch --> TP & TL & AL
    TP & TL & AL --> Spawner
    Spawner --> Claude & Codex & Gemini
    Claude --> WT1
    Codex --> WT2
    Gemini --> WT3
    WT1 & WT2 & WT3 --> QG --> Janitor --> Git

Why file-based state

Bernstein stores everything in .sdd/ files — no databases, no hidden memory. This is a deliberate design choice:

  • Inspectable: cat .sdd/backlog/open/*.yaml — read task specs as plain text
  • Recoverable: copy .sdd/ to another machine, restart Bernstein, resume
  • Auditable: every metric, trace, and lesson is a JSONL file you can grep
  • Git-friendly: back up .sdd/backlog/ and .sdd/metrics/ alongside your code

Runtime state (.sdd/runtime/) is ephemeral — PIDs, logs, signals. Never commit it.


Package structure

Since v1.6, core/ is organized into 35 sub-packages rather than flat files. Top-level modules like core/server.py and core/orchestrator.py still exist but are thin re-export shims — the real implementation lives in the corresponding sub-package. This keeps import paths stable while allowing each subsystem to grow independently.

Sub-package Responsibility
core/agents/ Spawner, agent lifecycle, heartbeat, signals, discovery, turn state
core/communication/ Bulletin board, notifications, desktop notify, scratchpad
core/config/ Seed parsing, config validation, hot reload, migration
core/cost/ Cost tracking, anomaly detection, budgets, forecasting
core/git/ Git operations, worktrees, merge queue, PR creation
core/grpc_gen/ Generated gRPC stubs
core/knowledge/ Knowledge base, lessons, RAG, semantic cache, embeddings
core/memory/ SQLite-backed memory store
core/observability/ Metrics, circuit breaker, loop detector, log redact, telemetry
core/orchestration/ Orchestrator, tick pipeline, manager, evolution, preflight
core/persistence/ Task store, session, file locks, WAL, store backends
core/planning/ Planner, plan loader, workflow DSL, scenario library
core/plugins_core/ Agency loader, plugin installer, skill discovery
core/protocols/ A2A, ACP, MCP, cluster, SSH backend, gRPC
core/quality/ Janitor, reviewer, quality gates, CI fix, mutation testing
core/routes/ FastAPI route modules
core/routing/ Router, cascade router, model fallback, LLM client
core/security/ Auth, guardrails, approval, compliance, DLP, RBAC
core/server/ Server app factory, middleware, launch, supervisor
core/tasks/ Task models, lifecycle FSM, store, batch, dead-letter queue
core/tokens/ Token monitor, context compression, prompt caching
core/trigger_sources/ GitHub, GitLab, Slack, Discord, file watch, webhook triggers

A few standalone files remain at core/ top level: defaults.py (all configurable constants), credential_scoping.py, and streaming_merge.py.


Core modules

Each module has one responsibility. The split follows the pattern: thin façade shims at core/, isolated logic in sub-packages.

Task Server (core/server.py shim → core/server/)

FastAPI application exposing the REST API. Central coordination point for all agents. State persists to .sdd/runtime/tasks.jsonl as a recovery checkpoint. The server app factory, middleware, and launch logic live in core/server/. Routes are split across core/routes/ — over 40 route modules including tasks.py, status.py, costs.py, agents.py, plans.py, quality.py, graduation.py, slack.py, webhooks.py, dashboard.py, auth.py, observability.py, health.py, gateway.py, and more.

Orchestrator (core/orchestrator.py shim → core/orchestration/)

The public façade. Runs the tick loop: fetch open tasks, batch by role, spawn agents, monitor heartbeats, handle completion. The actual logic lives in core/orchestration/:

  • Tick pipeline (core/orchestration/tick_pipeline.py) — data containers and task fetching
  • Task lifecycle (core/tasks/task_lifecycle.py) — claim, spawn, complete, retry, decompose
  • Agent lifecycle (core/agents/agent_lifecycle.py) — heartbeat, crash detection, reaping, loop/deadlock detection

All task and agent status changes are validated by the Lifecycle Governance Kernel (core/tasks/lifecycle.py), which enforces an explicit FSM transition table and emits typed LifecycleEvent records for audit and replay. See LIFECYCLE.md for the full state diagrams, transition tables, and TransitionReason/AbortReason enumerations.

Spawner (core/agents/spawner.py, shim at core/spawner.py)

Launches CLI agents for task batches. Builds the prompt (system role prompt + task context), selects the appropriate adapter via the registry, and spawns the process inside a git worktree. Core logic split across spawner_core.py, spawner_worktree.py, spawner_merge.py, and spawn_prompt.py in core/agents/. Wraps every command with build_worker_cmd() for process visibility (bernstein ps).

Router (core/routing/router.py)

Routes tasks to the appropriate model and effort level. Tier-aware: knows which providers are free/standard/premium, respects cost optimization, and applies skill-profile routing. Separate from core/routing/cascade_router.py which handles cost-aware cascading (try cheap model first, escalate on failure).

See architecture/model-routing.md for cascade behaviour.

Janitor (core/quality/janitor.py)

Verifies task completion via concrete signals: file exists, glob matches, tests pass, file contains expected content. Moves tasks from claimed/ to done/ or failed/ based on signal results. Does not trust agent claims — verifies them.

Reviewer (core/quality/reviewer.py)

LLM-powered quality review of completed work. Runs after the janitor. Can push corrections back into the queue if the produced code doesn't meet quality standards. Separate concern from janitor: janitor checks signals, reviewer checks quality.

Quality Gates (core/quality/quality_gates.py)

Automated gates that run after task completion: lint, type-check, test suite, PII scan, mutation testing, benchmark regression detection. Configurable in .bernstein/quality_gates.yaml. Blocking or non-blocking modes. Gates execute in parallel via asyncio.gather.

Agent Signals (core/agents/agent_signals.py)

File-based protocol for agent communication: WAKEUP (start work), SHUTDOWN (stop gracefully), HEARTBEAT (still alive). Agents write these to .sdd/runtime/signals/<role>-<session>/. The orchestrator polls them each tick. Circuit breaker writes SHUTDOWN when it detects purpose violations.

Token Monitor (core/tokens/token_monitor.py)

Tracks per-agent token consumption in real time. Detects runaway token growth and triggers auto-intervention (log warning, pause spawning, or kill the expensive agent). Integrates with cost anomaly detection (core/cost/cost_anomaly.py) which uses Z-score analysis on historical spend.


Supporting subsystems

Module (sub-package) What it does
core/observability/circuit_breaker.py Halts agents that repeatedly violate purpose or crash — sends SHUTDOWN signal
core/cost/cost_tracker.py Per-run cost budget tracking with threshold warnings
core/cost/cost_history.py Persisted cost history and alert logic
core/quality/cross_model_verifier.py Routes completed diffs to a different model for independent review
core/communication/bulletin.py Append-only bulletin board for cross-agent communication
core/agents/agent_discovery.py Auto-detect installed CLI agents, check login status, register capabilities
core/agents/agent_lifecycle.py Heartbeat monitoring, stall detection, crash reaping
core/security/approval.py Configurable approval gates between janitor verification and merge
core/quality/ci_fix.py Parse failing CI logs, create fix tasks, route to responsible agent
core/protocols/cluster.py Multi-node coordination: node registration, heartbeats, topology
core/persistence/file_locks.py File-level locking for concurrent agent safety
core/git/git_basic.py Git operations: run, status, staging, committing
core/git/git_ops.py Centralized git write operations for Bernstein
core/git/git_pr.py PR creation and branching operations
core/security/guardrails.py Output guardrails: secret detection, scope enforcement, dangerous operations
core/knowledge/knowledge_base.py Codebase indexing and task context enrichment
core/knowledge/lessons.py Agent lesson propagation — tag-matched, confidence-decayed over time
core/routing/llm.py Async native LLM client for the manager and external models
core/orchestration/manager.py LLM-powered task decomposition and review (splits across manager_models.py, manager_parsing.py, manager_prompts.py)
core/git/merge_queue.py FIFO merge queue for serialized branch merging with conflict routing
core/observability/metric_collector.py Metrics collection and recording
core/observability/metrics.py Performance metrics facade
core/orchestration/multi_cell.py Multi-cell orchestrator — each cell has its own manager + workers
core/communication/notifications.py Webhook notification system for run events
core/security/policy.py Model routing policy: tier optimization and provider routing
core/orchestration/preflight.py Pre-flight checks: validate CLI, API key, port availability
core/quality/quality_gates.py Automated quality gates: lint, type-check, test gates
core/security/quarantine.py Cross-run task quarantine — track repeatedly-failing tasks
core/observability/rate_limit_tracker.py Per-provider throttle tracking and 429 detection
core/config/seed.py Seed file parser for bernstein.yaml
core/persistence/session.py Session state persistence for fast resume after stop/restart
core/communication/signals.py Pivot signal system for strategic re-evaluation
core/persistence/store.py / store_postgres.py / store_redis.py Pluggable storage backends
core/persistence/sync.py Sync .sdd/backlog/*.yaml with the task server
core/tasks/task_store.py Thread-safe in-memory task store with JSONL persistence
core/config/upgrade_executor.py Autonomous upgrade executor with transaction-like safety
core/orchestration/worker.py bernstein-worker: visible process wrapper for spawned CLI agents
core/persistence/workspace.py Multi-repo workspace orchestration
core/git/worktree.py Git worktree lifecycle for agent session isolation
core/tokens/context_degradation_detector.py Monitor agent quality over time; restart when degraded
core/observability/loop_detector.py Agent loop and file-lock deadlock detection
core/observability/log_redact.py PII redaction filter installed globally at bootstrap
core/cost/cost_anomaly.py Cost anomaly detection with Z-score signaling
core/defaults.py All configurable constants (timeouts, thresholds, tuning)

Data flow

1. User creates task → Task Server (POST /tasks or bernstein.yaml)
2. Orchestrator tick loop fetches open tasks via tick pipeline
3. Router assigns model and effort per task properties
4. Spawner launches agents in isolated git worktrees
5. Agents work in parallel, writing heartbeats and signal files
6. Agent completes task → git commit in worktree
7. Janitor verifies completion signals (files exist, tests pass)
8. Quality gates run (lint, type-check, PII scan)
9. Reviewer optionally performs LLM quality review
10. Metrics recorded to .sdd/metrics/*.jsonl
11. Task marked done or failed

Sandbox, storage, and skills (1.9.x)

Three pluggable subsystems landed in the 1.9 series. Each has its own dedicated architecture page:

  • Sandbox backends — pluggable SandboxBackend / SandboxSession protocol (oai-002). First-party backends: local git worktree (default), docker, e2b (Firecracker microVMs), and modal (serverless containers with optional GPU). Third parties register through the bernstein.sandbox_backends entry-point group; bernstein agents sandbox-backends lists every installed backend.
  • Artifact storage sinks — async ArtifactSink protocol (oai-003) that decouples .sdd/ persistence from the local filesystem. First-party sinks cover local_fs, s3, gcs, azure_blob, and r2. BufferedSink preserves the WAL crash-safety contract by fsyncing locally first and mirroring the payload to the remote asynchronously. Third parties extend via bernstein.storage_sinks.
  • Progressive skill packs — OpenAI Agents SDK "Skills" pattern (oai-004): only a compact index ships in every spawn's system prompt; agents pull full bodies on demand via the load_skill MCP tool. Plugins register additional skill sources under bernstein.skill_sources. Inspect available skills with bernstein skills list / bernstein skills show <name>.

Adapter architecture

All adapters implement the CLIAdapter ABC from adapters/base.py:

class CLIAdapter(ABC):
    @abstractmethod
    def spawn(self, *, prompt, workdir, model_config, session_id, mcp_config=None) -> SpawnResult: ...
    @abstractmethod
    def is_alive(self, pid: int) -> bool: ...
    @abstractmethod
    def kill(self, pid: int) -> None: ...
    @abstractmethod
    def name(self) -> str: ...
    def detect_tier(self) -> ApiTierInfo | None: ...  # optional

The CachingAdapter wrapper in adapters/caching_adapter.py transparently deduplicates system prompt prefixes across agents, saving tokens on repeated spawns.

Adapters must use build_worker_cmd() for process visibility — this sets the process title and writes the PID metadata file that bernstein ps reads.


Key design decisions

Decision Why
Short-lived agents No persistent processes to manage. Spawn per task batch, exit when done. No "sleep" problem.
File-based state .sdd/ is git-friendly, inspectable, recoverable. No hidden databases.
Deterministic orchestrator Scheduling is code, not LLM. Predictable, auditable, testable.
Agent-agnostic Works with any CLI agent. No vendor lock-in.
Git worktree isolation Main branch never dirty. Each agent works on its own branch.
Janitor verification Concrete signals, not trust. Tests must pass, files must exist.
Branch is main Never master. PRs target main. CI enforces this.
OOP where useful, pure funcs where better Small classes for stateful collaborators; pure functions for deterministic transforms.

Cloudflare cloud execution

Bernstein can execute agents on Cloudflare's edge infrastructure in addition to local processes. The integration provides:

  • RuntimeBridge (bridges/cloudflare.py) — spawn agents on Workers + Durable Objects
  • WorkflowBridge (bridges/cloudflare_workflow.py) — durable multi-step workflows with auto-retry and approval gates
  • SandboxBridge (bridges/cloudflare_sandbox.py) — isolated V8 or container execution for untrusted code
  • BrowserRenderingBridge (bridges/browser_rendering.py) — headless web browsing for agents
  • R2WorkspaceSync (bridges/r2_sync.py) — content-addressed workspace file sync via R2
  • WorkersAIProvider (core/routing/cloudflare_ai.py) — free-tier LLM models for planning
  • D1AnalyticsClient (core/cost/d1_analytics.py) — usage metering and billing
  • VectorizeSemanticCache (core/memory/vectorize_cache.py) — semantic LLM response caching

The cloud bridges implement the same RuntimeBridge interface as local execution, so the orchestrator remains agnostic to where agents run. See the Cloudflare Overview for architecture diagrams and setup instructions.