Skip to content

Deployment Guide

How to run Bernstein in different environments. Each section is self-contained with complete configuration examples.

Jump to: - Local development - CI/CD — GitHub Actions - CI/CD — GitLab CI - Docker single-host - Docker Compose cluster - Kubernetes / Helm - Cloudflare cloud deployment - Team shared server (bare metal) - Environment variable reference - Zero-downtime upgrades (blue-green) - Upgrading - Troubleshooting deployments

New to Bernstein? Complete the Quickstart Tutorial first to see orchestration in action before deploying.


Local development

Prerequisites

  • Python 3.12+
  • Git
  • uv (recommended) or pip
  • At least one supported CLI agent installed (Claude Code, Codex, Gemini CLI, etc.)
  • An API key for at least one LLM provider

Install

# With uv (recommended — faster, handles virtualenvs automatically)
uv pip install bernstein

# Or install from source
git clone https://github.com/bernstein-ai/bernstein
cd bernstein
uv pip install -e .

# Or with pip
pip install bernstein

First run

# Set your API key
export ANTHROPIC_API_KEY="sk-ant-..."   # Claude
# export OPENAI_API_KEY="sk-..."        # GPT / Codex
# export GOOGLE_API_KEY="..."           # Gemini

# Initialize a project
cd /path/to/your/project
bernstein init

# Run a plan
bernstein run plans/my-project.yaml

# Or run interactively (type a goal, Bernstein decomposes it)
bernstein run

The task server starts on http://127.0.0.1:8052. State is stored in .sdd/ in the working directory. Add .sdd/ to .gitignore.

Local configuration file

Create bernstein.yaml in your project root:

# bernstein.yaml
cli: auto               # auto-detect installed agent (claude|codex|gemini|qwen)
model: sonnet           # default model
max_agents: 4           # concurrent agents; tune based on your API tier
budget: 5.00            # hard spending cap in USD (optional)

# Override model per role
role_model_policy:
  docs:
    model: haiku
  backend:
    model: sonnet
  architect:
    model: opus

# Share context files with all agents
context_files:
  - README.md
  - docs/architecture/ARCHITECTURE.md

Verify the install

bernstein doctor        # checks dependencies, API keys, git setup
bernstein status        # shows task server state

Smoke test

Run the zero-config demo to confirm everything works end-to-end before using it on real code:

bernstein quickstart --keep    # runs 3 tasks on a demo project, keeps output

Expected: all 3 tasks complete and a summary table prints with elapsed time and cost.


CI/CD — GitHub Actions

Single-shot plan execution

Run a plan file on every push or on demand:

# .github/workflows/bernstein.yml
name: Bernstein Agent Run
on:
  workflow_dispatch:
    inputs:
      plan:
        description: "Plan file (relative to repo root)"
        required: true
        default: "plans/ci-tasks.yaml"
      max_agents:
        description: "Max concurrent agents"
        required: false
        default: "2"

jobs:
  bernstein:
    runs-on: ubuntu-latest
    timeout-minutes: 60

    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0        # full history needed for worktree operations

      - name: Set up Python 3.12
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"
          cache: "pip"

      - name: Install Bernstein
        run: pip install bernstein

      - name: Install Claude Code (or your preferred CLI agent)
        run: npm install -g @anthropic-ai/claude-code
        # Alternatively:
        # run: pip install openai-codex

      - name: Run plan
        run: |
          bernstein run ${{ inputs.plan }} \
            --max-agents ${{ inputs.max_agents }} \
            --budget 10.00
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          # OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          BERNSTEIN_LOG_JSON: "true"
          BERNSTEIN_NO_TUI: "true"    # disable interactive TUI in CI

      - name: Upload state artifacts
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: bernstein-state-${{ github.run_id }}
          path: .sdd/
          retention-days: 7

      - name: Post summary
        if: always()
        run: bernstein report --format markdown >> $GITHUB_STEP_SUMMARY

Using the official GitHub Action

If the Bernstein GitHub Action is available in the marketplace:

# .github/workflows/bernstein-action.yml
name: Bernstein (Action)
on:
  push:
    branches: [main]

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: bernstein-ai/bernstein-action@v1
        with:
          plan: plans/ci-tasks.yaml
          max-agents: 2
          budget: 5.00
          anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

See docs/integrations/github-action.md for the full parameter reference.

Storing secrets

# Add secrets to your repository
gh secret set ANTHROPIC_API_KEY --body "sk-ant-..."
gh secret set OPENAI_API_KEY --body "sk-..."

Never commit API keys to your repository. Use GitHub Secrets for all provider credentials.

Verify CI output

A successful run uploads the .sdd/ state artifact and posts a markdown summary. In the Actions UI you should see:

  • A green bernstein job
  • .sdd/ uploaded under Artifacts
  • A step summary with task counts, duration, and cost

If the job fails with a non-zero exit, check the "Run plan" step logs. The most common causes are missing secrets and expired API keys.


CI/CD — GitLab CI

Basic pipeline stage

# .gitlab-ci.yml
bernstein:
  image: python:3.12-slim
  stage: build
  timeout: 60 minutes

  before_script:
    - apt-get update -q && apt-get install -y -q git npm
    - pip install bernstein
    - npm install -g @anthropic-ai/claude-code

  script:
    - bernstein run plans/ci-tasks.yaml --max-agents 2 --budget 10.00

  variables:
    ANTHROPIC_API_KEY: $ANTHROPIC_API_KEY   # set in GitLab CI/CD settings
    BERNSTEIN_LOG_JSON: "true"
    BERNSTEIN_NO_TUI: "true"    # disable interactive TUI in CI

  artifacts:
    paths:
      - .sdd/
    expire_in: 7 days
    when: always

Caching pip packages between runs

bernstein:
  image: python:3.12-slim
  stage: build

  cache:
    key: bernstein-pip-$CI_COMMIT_REF_SLUG
    paths:
      - .pip-cache/

  before_script:
    - apt-get update -q && apt-get install -y -q git npm
    - pip install --cache-dir .pip-cache bernstein
    - npm install -g @anthropic-ai/claude-code

  script:
    - bernstein run plans/ci-tasks.yaml --max-agents 2

  variables:
    ANTHROPIC_API_KEY: $ANTHROPIC_API_KEY
    BERNSTEIN_NO_TUI: "true"
    PIP_CACHE_DIR: "$CI_PROJECT_DIR/.pip-cache"

Multi-project pipeline (scheduled)

# .gitlab-ci.yml — runs nightly
weekly-refactor:
  image: python:3.12-slim
  stage: maintenance
  rules:
    - if: $CI_PIPELINE_SOURCE == "schedule"
  script:
    - pip install bernstein
    - bernstein run plans/weekly-maintenance.yaml --max-agents 3
  variables:
    ANTHROPIC_API_KEY: $ANTHROPIC_API_KEY

Protected variables (GitLab secrets)

  1. Go to Settings → CI/CD → Variables.
  2. Add ANTHROPIC_API_KEY with Protected and Masked flags enabled.
  3. The variable is available in protected branches and tags only.

Docker single-host

.env file

Copy the example and fill in your keys:

# .env  (never commit this file)
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...
BERNSTEIN_AUTH_TOKEN=change-me-in-production
BERNSTEIN_MAX_AGENTS=4
BERNSTEIN_DASHBOARD_PASSWORD=change-me

Dockerfile

FROM python:3.12-slim

WORKDIR /workspace

# System dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    git \
    curl \
    npm \
    && rm -rf /var/lib/apt/lists/*

# Install a CLI agent
RUN npm install -g @anthropic-ai/claude-code

# Install Bernstein
RUN pip install --no-cache-dir bernstein

# Non-root user for security
RUN useradd -m -u 1000 bernstein && chown -R bernstein /workspace
USER bernstein

# State directory
VOLUME ["/workspace/.sdd"]

ENV BERNSTEIN_BIND_HOST=0.0.0.0
ENV BERNSTEIN_PORT=8052
ENV BERNSTEIN_NO_TUI=true

EXPOSE 8052

HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
  CMD curl -f http://localhost:8052/health || exit 1

CMD ["bernstein", "conduct"]
docker build -t bernstein:latest .
docker run -d \
  --name bernstein \
  --env-file .env \
  -p 8052:8052 \
  -v bernstein-state:/workspace/.sdd \
  -v $(pwd):/workspace/project \
  bernstein:latest

Docker Compose cluster

The included docker-compose.yaml runs a full cluster: task server, orchestrator, scalable workers, PostgreSQL, Redis, Prometheus, and Grafana.

Setup

# Copy and edit the env file
cp .env.example .env
# Edit .env: set ANTHROPIC_API_KEY and BERNSTEIN_AUTH_TOKEN

# Start the full stack
docker compose up -d

# Scale workers (each worker claims tasks from the shared server)
docker compose up -d --scale bernstein-worker=4

# View logs
docker compose logs -f bernstein-server
docker compose logs -f bernstein-orchestrator

Service endpoints

Service URL Purpose
Task server + dashboard http://localhost:8052/dashboard Web UI, task management
Task server API http://localhost:8052 REST API
Prometheus http://localhost:9090 Metrics
Grafana http://localhost:3000 Agent dashboards (admin/admin)

Stopping and cleanup

docker compose down          # stop containers, keep volumes
docker compose down -v       # stop and delete all data volumes

Verify the cluster is healthy

# All containers should be "Up"
docker compose ps

# Task server health endpoint
curl http://localhost:8052/health

# Check the dashboard
open http://localhost:8052/dashboard

Expected health response:

{"status": "ok", "tasks_open": 0, "tasks_claimed": 0, "tasks_done": 0}


Kubernetes / Helm

Prerequisites

  • Kubernetes 1.24+
  • Helm 3.x
  • kubectl configured for your cluster
  • Persistent storage (EBS, NFS, local-path provisioner, etc.)

Install with Helm

# From the local chart
helm install bernstein ./deploy/helm/bernstein \
  --namespace bernstein \
  --create-namespace \
  -f my-values.yaml

# Or add the Helm repo (when published)
helm repo add bernstein https://charts.bernstein.dev
helm repo update
helm install bernstein bernstein/bernstein \
  --namespace bernstein \
  --create-namespace

Provider API keys (Kubernetes secret)

kubectl create secret generic bernstein-provider-keys \
  --namespace bernstein \
  --from-literal=ANTHROPIC_API_KEY="sk-ant-..." \
  --from-literal=OPENAI_API_KEY="sk-..."

values.yaml — complete example

# my-values.yaml
image:
  repository: bernstein
  tag: latest
  pullPolicy: IfNotPresent

server:
  replicaCount: 1
  resources:
    requests:
      memory: "512Mi"
      cpu: "250m"
    limits:
      memory: "4Gi"
      cpu: "2000m"
  persistence:
    enabled: true
    storageClass: ""     # use cluster default
    size: 10Gi
  service:
    type: ClusterIP
    port: 8052

worker:
  replicaCount: 2
  resources:
    requests:
      memory: "1Gi"
      cpu: "500m"
    limits:
      memory: "8Gi"
      cpu: "4000m"
  persistence:
    enabled: true
    storageClass: ""
    size: 20Gi           # larger: holds git worktrees
  autoscaling:
    enabled: true
    minReplicas: 1
    maxReplicas: 20
    targetQueueDepth: "2"           # scale up when 2+ tasks per worker
    targetCPUUtilizationPercentage: 70

providerKeys:
  existingSecret: bernstein-provider-keys

auth:
  enabled: true
  # existingSecret: my-auth-secret   # use an existing secret instead

config:
  maxAgents: 6
  logLevel: INFO
  clusterEnabled: true

monitoring:
  prometheus:
    enabled: true
  grafana:
    enabled: true
    adminPassword: "change-me"
helm install bernstein ./deploy/helm/bernstein \
  --namespace bernstein \
  --create-namespace \
  -f my-values.yaml

# Verify
kubectl get pods -n bernstein
kubectl port-forward -n bernstein svc/bernstein-server 8052:8052

For the full Helm chart parameter reference, see docs/operations/HELM_DEPLOYMENT.md.

Raw Kubernetes manifests (without Helm)

# bernstein.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: bernstein
  namespace: bernstein
spec:
  replicas: 1
  selector:
    matchLabels:
      app: bernstein
  template:
    metadata:
      labels:
        app: bernstein
    spec:
      containers:
        - name: bernstein
          image: bernstein:latest
          ports:
            - containerPort: 8052
              name: http
            - containerPort: 9090
              name: metrics
          env:
            - name: BERNSTEIN_BIND_HOST
              value: "0.0.0.0"
            - name: BERNSTEIN_CLUSTER_ENABLED
              value: "true"
            - name: ANTHROPIC_API_KEY
              valueFrom:
                secretKeyRef:
                  name: bernstein-provider-keys
                  key: ANTHROPIC_API_KEY
          volumeMounts:
            - name: state
              mountPath: /workspace/.sdd
          livenessProbe:
            httpGet:
              path: /health/live
              port: http
            initialDelaySeconds: 10
            periodSeconds: 30
          readinessProbe:
            httpGet:
              path: /health/ready
              port: http
            initialDelaySeconds: 5
            periodSeconds: 10
          resources:
            requests:
              memory: "512Mi"
              cpu: "250m"
            limits:
              memory: "4Gi"
              cpu: "2000m"
      volumes:
        - name: state
          persistentVolumeClaim:
            claimName: bernstein-state
---
apiVersion: v1
kind: Service
metadata:
  name: bernstein
  namespace: bernstein
spec:
  selector:
    app: bernstein
  ports:
    - port: 8052
      targetPort: http
      name: http
    - port: 9090
      targetPort: metrics
      name: metrics
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: bernstein-state
  namespace: bernstein
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
kubectl apply -f bernstein.yaml
kubectl get pods -n bernstein

Cloudflare cloud deployment

Bernstein can execute agents on Cloudflare's edge infrastructure instead of local processes. This is useful for teams that want centralized billing, isolated sandboxes for untrusted code, or global low-latency agent dispatch.

Quick start

# 1. Install wrangler
npm install -g wrangler
wrangler login

# 2. Deploy the agent Worker
bernstein cloud deploy --worker-name bernstein-agent

# 3. Authenticate with Bernstein Cloud
bernstein cloud login

# 4. Run orchestration in the cloud
bernstein cloud run "Add OAuth2 authentication" --max-agents 5 --budget 25.00

What runs where

Component Location Purpose
Orchestrator Local or your server Deterministic tick loop, task scheduling
Agent execution Cloudflare Workers / Sandboxes Code generation, testing
Workspace files Cloudflare R2 File sync between local and cloud agents
Analytics / billing Cloudflare D1 Usage metering, quota enforcement
LLM response cache Cloudflare Vectorize Semantic prompt deduplication
Internal LLM (planning) Cloudflare Workers AI Free-tier models for task decomposition

Environment variables for Cloudflare

Variable Description
CLOUDFLARE_ACCOUNT_ID Cloudflare account identifier
CLOUDFLARE_API_TOKEN API token with Workers, R2, D1, Vectorize permissions
BERNSTEIN_CLOUD_API_KEY API key for bernstein.run hosted service

For the full Cloudflare setup guide including R2 buckets, D1 databases, and Vectorize indexes, see the Cloudflare Setup documentation.


Team shared server

Running Bernstein on a dedicated server that multiple developers share. Each developer points their local tools at the shared task server.

Server setup (systemd)

# Create a system user
sudo useradd -r -s /bin/bash -d /opt/bernstein bernstein
sudo mkdir -p /opt/bernstein/workspace /var/lib/bernstein/.sdd
sudo chown -R bernstein:bernstein /opt/bernstein /var/lib/bernstein

# Install into a virtualenv
sudo -u bernstein python3.12 -m venv /opt/bernstein/venv
sudo -u bernstein /opt/bernstein/venv/bin/pip install bernstein

# Install a CLI agent (system-wide or in the venv)
npm install -g @anthropic-ai/claude-code
# /etc/systemd/system/bernstein.service
[Unit]
Description=Bernstein Orchestrator
After=network.target
Wants=network.target

[Service]
Type=simple
User=bernstein
Group=bernstein
WorkingDirectory=/opt/bernstein/workspace

ExecStart=/opt/bernstein/venv/bin/bernstein conduct
ExecStop=/opt/bernstein/venv/bin/bernstein stop --hard

Restart=on-failure
RestartSec=10

# Secrets — set via EnvironmentFile in production
EnvironmentFile=/etc/bernstein/env
Environment=BERNSTEIN_SDD_DIR=/var/lib/bernstein/.sdd
Environment=BERNSTEIN_BIND_HOST=0.0.0.0
Environment=BERNSTEIN_PORT=8052
Environment=BERNSTEIN_LOG_JSON=true
Environment=BERNSTEIN_MAX_AGENTS=8

# Security hardening
NoNewPrivileges=true
ProtectSystem=strict
PrivateTmp=true
ReadWritePaths=/var/lib/bernstein /opt/bernstein/workspace

[Install]
WantedBy=multi-user.target
# /etc/bernstein/env  (mode 0600, owned by bernstein)
ANTHROPIC_API_KEY=sk-ant-...
BERNSTEIN_AUTH_TOKEN=strong-random-secret
BERNSTEIN_DASHBOARD_PASSWORD=another-strong-password
sudo systemctl daemon-reload
sudo systemctl enable bernstein
sudo systemctl start bernstein
sudo journalctl -u bernstein -f

Connecting as a team member

Each developer configures their local bernstein.yaml to point at the shared server:

# bernstein.yaml (developer's local project)
server_url: http://bernstein.internal:8052
# or via env: BERNSTEIN_SERVER_URL=http://bernstein.internal:8052
# Submit tasks to the shared server without running a local orchestrator
bernstein task add "Implement login page" --role frontend --priority 2
bernstein status   # see what the shared server is running
bernstein ps       # list active agents

Reverse proxy (nginx)

Expose the dashboard behind TLS:

# /etc/nginx/sites-enabled/bernstein
server {
    listen 443 ssl;
    server_name bernstein.internal;

    ssl_certificate     /etc/ssl/certs/bernstein.crt;
    ssl_certificate_key /etc/ssl/private/bernstein.key;

    # Dashboard
    location / {
        proxy_pass http://127.0.0.1:8052;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto https;

        # Required for SSE (live dashboard streaming)
        proxy_buffering off;
        proxy_cache off;
        proxy_read_timeout 600s;
    }
}

Multi-project setup

Run separate orchestrators on different ports for project isolation:

# /etc/systemd/system/bernstein@.service  (template unit)
[Unit]
Description=Bernstein Orchestrator  %i
After=network.target

[Service]
Type=simple
User=bernstein
WorkingDirectory=/opt/bernstein/projects/%i
EnvironmentFile=/etc/bernstein/%i.env
ExecStart=/opt/bernstein/venv/bin/bernstein conduct

[Install]
WantedBy=multi-user.target
# Start project-a on port 8052 and project-b on port 8053
sudo systemctl start bernstein@project-a
sudo systemctl start bernstein@project-b

Environment variables

Variable Default Description
ANTHROPIC_API_KEY Claude API key
OPENAI_API_KEY OpenAI / Codex API key
GOOGLE_API_KEY Gemini API key
BERNSTEIN_SERVER_URL http://127.0.0.1:8052 Task server URL (for remote workers)
BERNSTEIN_BIND_HOST 127.0.0.1 Server bind address
BERNSTEIN_PORT 8052 Server port
BERNSTEIN_MAX_AGENTS 6 Max concurrent agents
BERNSTEIN_AUTH_TOKEN Inter-node auth secret (cluster mode)
BERNSTEIN_DASHBOARD_PASSWORD Dashboard HTTP auth password
BERNSTEIN_STORAGE_BACKEND memory memory, postgres, or redis
BERNSTEIN_DATABASE_URL PostgreSQL DSN (e.g. postgresql://user:pass@host/db)
BERNSTEIN_REDIS_URL Redis URL (e.g. redis://localhost:6379/0)
BERNSTEIN_CLUSTER_ENABLED false Enable multi-node cluster mode
BERNSTEIN_LOG_LEVEL INFO Log verbosity (DEBUG/INFO/WARNING/ERROR)
BERNSTEIN_LOG_JSON false Emit JSON log lines (for log aggregators)
BERNSTEIN_BUDGET Hard spending cap in USD
BERNSTEIN_TICK_INTERVAL 5 Orchestrator tick interval in seconds
BERNSTEIN_SKIP_GATES Skip quality gates (requires BERNSTEIN_SKIP_GATE_REASON)
BERNSTEIN_NO_TUI Disable interactive TUI (useful in CI)
BERNSTEIN_QUIET Suppress all non-error output

Zero-downtime upgrades (blue-green)

Bernstein supports blue-green deployments to upgrade the server without dropping in-flight tasks. The mechanism swaps the .sdd/ symlink between two parallel state directories (.sdd-blue/ and .sdd-green/), letting the new version warm up before traffic switches.

How it works

.sdd/  →  .sdd-blue/    ← current live state
          .sdd-green/   ← new version (being prepared)

On switch_traffic(), the .sdd/ symlink is atomically re-pointed at .sdd-green/. If the health check fails, rollback() re-points it back to .sdd-blue/.

Python API

from pathlib import Path
from bernstein.core.blue_green import BlueGreenConfig, BlueGreenDeployment

cfg = BlueGreenConfig(
    health_check_url="http://127.0.0.1:8052/status",
    rollback_on_error=True,
    switch_delay_seconds=10,
)
deploy = BlueGreenDeployment(cfg, base_dir=Path("."))

# 1. Prepare the green environment with the new version
green_path = deploy.prepare_green("2.1.0")

# 2. Start the new server process pointing at green_path
# ... start bernstein with BERNSTEIN_SDD_DIR=green_path ...

# 3. Check health
if deploy.health_check():
    deploy.switch_traffic()   # symlink: .sdd/ → .sdd-green/
else:
    deploy.rollback()         # stays on blue; green is discarded

Upgrade procedure (bare metal)

# 1. Install the new version alongside the old
pip install bernstein==2.1.0 --target /opt/bernstein/v2.1.0

# 2. Start the new server on a staging port
BERNSTEIN_PORT=8053 BERNSTEIN_SDD_DIR=.sdd-green \
  /opt/bernstein/v2.1.0/bin/bernstein conduct &

# 3. Verify it is healthy
curl http://127.0.0.1:8053/status

# 4. Switch traffic via the Python API or CLI
python3 -c "
from pathlib import Path
from bernstein.core.blue_green import BlueGreenConfig, BlueGreenDeployment
cfg = BlueGreenConfig(health_check_url='http://127.0.0.1:8053/status')
BlueGreenDeployment(cfg, Path('.')).switch_traffic()
"

# 5. Stop the old server
kill $(cat .sdd-blue/runtime/server.pid)

Check deployment status

status = deploy.status()
print(status.active)         # "blue" or "green"
print(status.blue_version)   # "2.0.0"
print(status.green_version)  # "2.1.0"
print(status.healthy)        # True / False

Upgrading

  1. Stop the running instance: bernstein stop
  2. Back up state: cp -r .sdd .sdd.backup-$(date +%Y%m%d)
  3. Install the new version: pip install --upgrade bernstein
  4. Start: bernstein run

State format is forward-compatible between minor versions. For major version upgrades, check docs/migrations/migration-guides.md for breaking changes.

To roll back: pip install bernstein==<previous-version> and restore .sdd.backup/.

For zero-downtime upgrades on production servers, use the blue-green procedure above.


Troubleshooting deployments

Task server health check fails on startup

The server may be waiting for PostgreSQL or Redis to be ready. Check dependencies first:

# Docker Compose
docker compose logs postgres
docker compose logs redis

# Kubernetes
kubectl logs -n bernstein -l app.kubernetes.io/component=postgresql
kubectl get events -n bernstein --sort-by='.lastTimestamp'

If the server crashes immediately, check the server log directly:

# Local
cat .sdd/runtime/logs/server.log

# Docker
docker logs bernstein-server

# Kubernetes
kubectl logs -n bernstein deploy/bernstein-server

Workers are not claiming tasks

Check 1: Auth token mismatch. Every node must share the same BERNSTEIN_AUTH_TOKEN:

# Docker Compose — inspect worker env
docker compose exec bernstein-worker env | grep AUTH_TOKEN

# Kubernetes — decode the secret
kubectl get secret bernstein-auth -n bernstein -o jsonpath='{.data.BERNSTEIN_AUTH_TOKEN}' | base64 -d

Check 2: Worker cannot reach the task server. Verify the BERNSTEIN_SERVER_URL is correct and reachable from the worker:

# From inside the worker container
docker compose exec bernstein-worker curl -s http://bernstein-server:8052/health

# Kubernetes
kubectl exec -n bernstein deploy/bernstein-worker -- curl -s http://bernstein-server:8052/health

Check 3: No open tasks. If the backlog is empty, workers have nothing to do:

curl http://localhost:8052/tasks?status=open

Port 8052 is already in use

A previous Bernstein session did not shut down cleanly. Find and stop it:

# Local — use Bernstein's own stop command
bernstein stop --force

# Or find the PID manually
cat .sdd/runtime/pids/server.json
kill <pid>

# Or kill by port
lsof -ti:8052 | xargs kill -9

Agents spawn but exit immediately

Agents exit when they have no work or cannot authenticate. Check logs:

bernstein logs -f                       # follow all agent output
bernstein logs -a claude                # filter by agent name
tail -f .sdd/runtime/logs/*.log         # raw log files

Common causes:

Symptom Likely cause Fix
AuthenticationError in log API key missing or expired Re-export ANTHROPIC_API_KEY etc.
Agent exits with code 1 immediately CLI not authenticated Run claude login / codex login
Connection refused to task server Server not started Check bernstein status
Agent claims task then fails it Task prompt too long Reduce scope in task config

Tasks stuck in "claimed" status

An agent crashed before reporting completion. The task stays claimed until the janitor reclaims it (default: 5 minutes) or you force a reset:

# Auto-fix stale locks
bernstein doctor --fix

# Or restart cleanly
bernstein stop && bernstein

Stale claimed tasks appear in bernstein status with a "claimed for >5m" annotation.

Docker volume permissions

If the server cannot write to .sdd/, the named volume may be owned by root:

docker compose exec bernstein-server ls -la /workspace/.sdd
# If root-owned:
docker compose exec --user root bernstein-server chown -R bernstein:bernstein /workspace/.sdd

Kubernetes pod stuck in Pending

Usually a resource or PersistentVolumeClaim issue:

kubectl describe pod -n bernstein -l app.kubernetes.io/name=bernstein
kubectl get pvc -n bernstein

If PVC is in Pending, your cluster may not have a default StorageClass:

kubectl get storageclass
# Set one as default if none exists:
kubectl patch storageclass <name> -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Grafana shows no data

Check that Prometheus is scraping the task server:

# Docker Compose — open Prometheus targets page
open http://localhost:9090/targets

# Kubernetes
kubectl port-forward -n bernstein svc/prometheus 9090:9090 &
open http://localhost:9090/targets

If bernstein-server shows as DOWN, the metrics endpoint is not reachable. Verify the server is running and the Prometheus scrape config points to the correct host and port.

Still stuck?

  1. Run bernstein doctor — it checks the most common issues automatically.
  2. Check the Troubleshooting guide for agent-level issues (API errors, quality gate failures, cost overruns).
  3. Open an issue at sipyourdrink-ltd/bernstein with the output of bernstein doctor --json.