Deployment Guide¶

How to run Bernstein in different environments. Each section is self-contained with complete configuration examples.

Jump to: - Local development - CI/CD - GitHub Actions - CI/CD - GitLab CI - Docker single-host - Docker Compose cluster - Kubernetes / Helm - Cloudflare cloud deployment - Team shared server (bare metal) - Environment variable reference - Zero-downtime upgrades (blue-green) - Upgrading - Troubleshooting deployments

New to Bernstein? Complete the Quickstart Tutorial first to see orchestration in action before deploying.

Local development¶

Prerequisites¶

Python 3.12+
Git
uv (recommended) or pip
At least one supported CLI agent installed (Claude Code, Codex, Gemini CLI, etc.)
An API key for at least one LLM provider

Install¶

# With uv (recommended - faster, handles virtualenvs automatically)
uv pip install bernstein

# Or install from source
git clone https://github.com/sipyourdrink-ltd/bernstein
cd bernstein
uv pip install -e .

# Or with pip
pip install bernstein

First run¶

# Set your API key
export ANTHROPIC_API_KEY="sk-ant-..."   # Claude
# export OPENAI_API_KEY="sk-..."        # GPT / Codex
# export GOOGLE_API_KEY="..."           # Gemini

# Initialize a project
cd /path/to/your/project
bernstein init

# Run a plan
bernstein run plans/my-project.yaml

# Or run interactively (type a goal, Bernstein decomposes it)
bernstein run

The task server starts on http://127.0.0.1:8052. State is stored in .sdd/ in the working directory. Add .sdd/ to .gitignore.

Local configuration file¶

Create bernstein.yaml in your project root:

# bernstein.yaml
cli: auto               # auto-detect installed agent (claude|codex|gemini|qwen)
model: sonnet           # default model
max_agents: 4           # concurrent agents; tune based on your API tier
budget: 5.00            # hard spending cap in USD (optional)

# Override model per role
role_model_policy:
  docs:
    model: haiku
  backend:
    model: sonnet
  architect:
    model: opus

# Share context files with all agents
context_files:
  - README.md
  - docs/architecture/ARCHITECTURE.md

Verify the install¶

bernstein doctor        # checks dependencies, API keys, git setup
bernstein status        # shows task server state

Smoke test¶

Run the zero-config demo to confirm everything works end-to-end before using it on real code:

bernstein quickstart --keep    # runs 3 tasks on a demo project, keeps output

Expected: all 3 tasks complete and a summary table prints with elapsed time and cost.

CI/CD - GitHub Actions¶

Single-shot plan execution¶

Run a plan file on every push or on demand:

# .github/workflows/bernstein.yml
name: Bernstein Agent Run
on:
  workflow_dispatch:
    inputs:
      plan:
        description: "Plan file (relative to repo root)"
        required: true
        default: "plans/ci-tasks.yaml"
      max_agents:
        description: "Max concurrent agents"
        required: false
        default: "2"

jobs:
  bernstein:
    runs-on: ubuntu-latest
    timeout-minutes: 60

    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0        # full history needed for worktree operations

      - name: Set up Python 3.12
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"
          cache: "pip"

      - name: Install Bernstein
        run: pip install bernstein

      - name: Install Claude Code (or your preferred CLI agent)
        run: npm install -g @anthropic-ai/claude-code
        # Alternatively:
        # run: npm install -g @openai/codex

      - name: Run plan
        run: |
          bernstein run ${{ inputs.plan }} \
            --max-agents ${{ inputs.max_agents }} \
            --budget 10.00
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          # OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          BERNSTEIN_LOG_JSON: "true"
          BERNSTEIN_NO_TUI: "true"    # disable interactive TUI in CI

      - name: Upload state artifacts
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: bernstein-state-${{ github.run_id }}
          path: .sdd/
          retention-days: 7

      - name: Post summary
        if: always()
        run: bernstein report >> $GITHUB_STEP_SUMMARY

Running Bernstein from a workflow¶

Install Bernstein with pip and invoke the CLI directly in a run step:

# .github/workflows/bernstein-action.yml
name: Bernstein
on:
  push:
    branches: [main]

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - name: Install Bernstein
        run: pip install bernstein
      - name: Run plan
        run: bernstein run plans/ci-tasks.yaml --max-agents 2 --budget 5.00
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

See docs/integrations/github-action.md for the full parameter reference.

Storing secrets¶

# Add secrets to your repository
gh secret set ANTHROPIC_API_KEY --body "sk-ant-..."
gh secret set OPENAI_API_KEY --body "sk-..."

Never commit API keys to your repository. Use GitHub Secrets for all provider credentials.

Verify CI output¶

A successful run uploads the .sdd/ state artifact and posts a markdown summary. In the Actions UI you should see:

A green bernstein job
.sdd/ uploaded under Artifacts
A step summary with task counts, duration, and cost

If the job fails with a non-zero exit, check the "Run plan" step logs. The most common causes are missing secrets and expired API keys.

CI/CD - GitLab CI¶

Basic pipeline stage¶

# .gitlab-ci.yml
bernstein:
  image: python:3.12-slim
  stage: build
  timeout: 60 minutes

  before_script:
    - apt-get update -q && apt-get install -y -q git npm
    - pip install bernstein
    - npm install -g @anthropic-ai/claude-code

  script:
    - bernstein run plans/ci-tasks.yaml --max-agents 2 --budget 10.00

  variables:
    ANTHROPIC_API_KEY: $ANTHROPIC_API_KEY   # set in GitLab CI/CD settings
    BERNSTEIN_LOG_JSON: "true"
    BERNSTEIN_NO_TUI: "true"    # disable interactive TUI in CI

  artifacts:
    paths:
      - .sdd/
    expire_in: 7 days
    when: always

Caching pip packages between runs¶

bernstein:
  image: python:3.12-slim
  stage: build

  cache:
    key: bernstein-pip-$CI_COMMIT_REF_SLUG
    paths:
      - .pip-cache/

  before_script:
    - apt-get update -q && apt-get install -y -q git npm
    - pip install --cache-dir .pip-cache bernstein
    - npm install -g @anthropic-ai/claude-code

  script:
    - bernstein run plans/ci-tasks.yaml --max-agents 2

  variables:
    ANTHROPIC_API_KEY: $ANTHROPIC_API_KEY
    BERNSTEIN_NO_TUI: "true"
    PIP_CACHE_DIR: "$CI_PROJECT_DIR/.pip-cache"

Multi-project pipeline (scheduled)¶

# .gitlab-ci.yml - runs nightly
weekly-refactor:
  image: python:3.12-slim
  stage: maintenance
  rules:
    - if: $CI_PIPELINE_SOURCE == "schedule"
  script:
    - pip install bernstein
    - bernstein run plans/weekly-maintenance.yaml --max-agents 3
  variables:
    ANTHROPIC_API_KEY: $ANTHROPIC_API_KEY

Protected variables (GitLab secrets)¶

Go to Settings → CI/CD → Variables.
Add ANTHROPIC_API_KEY with Protected and Masked flags enabled.
The variable is available in protected branches and tags only.

Docker single-host¶

.env file¶

Copy the example and fill in your keys:

# .env  (never commit this file)
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...
BERNSTEIN_AUTH_TOKEN=change-me-in-production
BERNSTEIN_MAX_AGENTS=4
BERNSTEIN_DASHBOARD_PASSWORD=change-me

Dockerfile¶

FROM python:3.12-slim

WORKDIR /workspace

# System dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    git \
    curl \
    npm \
    && rm -rf /var/lib/apt/lists/*

# Install a CLI agent
RUN npm install -g @anthropic-ai/claude-code

# Install Bernstein
RUN pip install --no-cache-dir bernstein

# Non-root user for security
RUN useradd -m -u 1000 bernstein && chown -R bernstein /workspace
USER bernstein

# State directory
VOLUME ["/workspace/.sdd"]

ENV BERNSTEIN_BIND_HOST=0.0.0.0
ENV BERNSTEIN_PORT=8052
ENV BERNSTEIN_NO_TUI=true

EXPOSE 8052

HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
  CMD curl -f http://localhost:8052/health || exit 1

CMD ["bernstein", "conduct"]

docker build -t bernstein:latest .
docker run -d \
  --name bernstein \
  --env-file .env \
  -p 8052:8052 \
  -v bernstein-state:/workspace/.sdd \
  -v $(pwd):/workspace/project \
  bernstein:latest

Docker Compose cluster¶

The included docker-compose.yaml runs a full cluster: task server, orchestrator, scalable workers, PostgreSQL, Redis, Prometheus, and Grafana.

Setup¶

# Create the env file: set ANTHROPIC_API_KEY and BERNSTEIN_AUTH_TOKEN
cat > .env <<'EOF'
BERNSTEIN_AUTH_TOKEN=change-me
ANTHROPIC_API_KEY=sk-ant-...
EOF

# Start the full stack
docker compose up -d

# Scale workers (each worker claims tasks from the shared server)
docker compose up -d --scale bernstein-worker=4

# View logs
docker compose logs -f bernstein-server
docker compose logs -f bernstein-orchestrator

Single-worker task server. Only bernstein-worker replicas scale horizontally. The bernstein-server container must run with exactly one uvicorn worker - the in-process TaskStore holds state in memory and guards mutations with asyncio.Lock. Running uvicorn --workers N (or setting WEB_CONCURRENCY>1 / BERNSTEIN_WORKERS>1) interleaves JSONL appends and lets two workers claim the same task. The server refuses to boot when multi-worker mode is requested; use a horizontal pool of bernstein-worker replicas (or migrate to the SQLite/Redis backends - separate ticket) for parallelism.

Backing up state¶

.sdd/ is mounted as a named volume (sdd-data). To back it up:

docker run --rm -v bernstein_sdd-data:/data -v $(pwd):/backup \
  alpine tar czf /backup/sdd-backup.tar.gz /data

Service endpoints¶

Service	URL	Purpose
Task server + dashboard	`http://localhost:8052/dashboard`	Web UI, task management
Task server API	`http://localhost:8052`	REST API
Prometheus	`http://localhost:9090`	Metrics
Grafana	`http://localhost:3000`	Agent dashboards (admin/admin)

Stopping and cleanup¶

docker compose down          # stop containers, keep volumes
docker compose down -v       # stop and delete all data volumes

Verify the cluster is healthy¶

# All containers should be "Up"
docker compose ps

# Task server health endpoint
curl http://localhost:8052/health

# Check the dashboard
open http://localhost:8052/dashboard

Expected health response:

{"status": "ok", "tasks_open": 0, "tasks_claimed": 0, "tasks_done": 0}

Kubernetes / Helm¶

Prerequisites¶

Kubernetes 1.24+
Helm 3.x
kubectl configured for your cluster
Persistent storage (EBS, NFS, local-path provisioner, etc.)

Add the Bitnami repo first - the chart's PostgreSQL and Redis sub-charts depend on it:

helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

Install with Helm¶

# From the local chart
helm install bernstein ./deploy/helm/bernstein \
  --namespace bernstein \
  --create-namespace \
  -f my-values.yaml

# Or add the Helm repo (when published)
helm repo add bernstein https://charts.bernstein.dev
helm repo update
helm install bernstein bernstein/bernstein \
  --namespace bernstein \
  --create-namespace

Provider API keys (Kubernetes secret)¶

kubectl create secret generic bernstein-provider-keys \
  --namespace bernstein \
  --from-literal=ANTHROPIC_API_KEY="sk-ant-..." \
  --from-literal=OPENAI_API_KEY="sk-..."

`values.yaml` - complete example¶

# my-values.yaml
image:
  repository: bernstein
  tag: latest
  pullPolicy: IfNotPresent

server:
  replicaCount: 1
  resources:
    requests:
      memory: "512Mi"
      cpu: "250m"
    limits:
      memory: "4Gi"
      cpu: "2000m"
  persistence:
    enabled: true
    storageClass: ""     # use cluster default
    size: 10Gi
  service:
    type: ClusterIP
    port: 8052

worker:
  replicaCount: 2
  resources:
    requests:
      memory: "1Gi"
      cpu: "500m"
    limits:
      memory: "8Gi"
      cpu: "4000m"
  persistence:
    enabled: true
    storageClass: ""
    size: 20Gi           # larger: holds git worktrees
  autoscaling:
    enabled: true
    minReplicas: 1
    maxReplicas: 20
    targetQueueDepth: "2"           # scale up when 2+ tasks per worker
    targetCPUUtilizationPercentage: 70

providerKeys:
  existingSecret: bernstein-provider-keys

auth:
  enabled: true
  # existingSecret: my-auth-secret   # use an existing secret instead

config:
  maxAgents: 6
  logLevel: INFO
  clusterEnabled: true

monitoring:
  prometheus:
    enabled: true
  grafana:
    enabled: true
    adminPassword: "change-me"

helm install bernstein ./deploy/helm/bernstein \
  --namespace bernstein \
  --create-namespace \
  -f my-values.yaml

# Verify
kubectl get pods -n bernstein
kubectl port-forward -n bernstein svc/bernstein-server 8052:8052

For the full Helm chart parameter reference, see docs/operations/HELM_DEPLOYMENT.md.

Common overrides¶

Scale workers:

helm upgrade bernstein ./deploy/helm/bernstein \
  --namespace bernstein \
  --set worker.replicaCount=8

Disable HPA (fixed worker count):

--set worker.autoscaling.enabled=false

Expose the task server via ingress:

--set ingress.enabled=true \
--set ingress.className=nginx \
--set "ingress.hosts[0].host=bernstein.example.com" \
--set "ingress.hosts[0].paths[0].path=/" \
--set "ingress.hosts[0].paths[0].pathType=Prefix"

Use external PostgreSQL/Redis (e.g. managed cloud services):

--set postgresql.enabled=false \
--set redis.enabled=false \
--set externalDatabase.url="postgresql://user:pass@host:5432/bernstein" \
--set externalRedis.url="redis://host:6379/0"

Architecture¶

                          ┌─────────────────┐
                          │  Ingress (opt.)  │
                          └────────┬────────┘
                                   │
                    ┌──────────────▼──────────────┐
                    │      bernstein-server        │
                    │   Deployment + Service       │
                    │   (ClusterIP :8052)          │
                    └──┬──────────────────────┬───┘
                       │                      │
         ┌─────────────▼─────────┐   ┌────────▼────────────┐
         │  bernstein-orchestrat │   │  bernstein-worker    │
         │  Deployment (1 pod)   │   │  StatefulSet (N pods)│
         │  run --remote         │   │  worker --server ... │
         └───────────────────────┘   └─────────────────────┘
                       │                      │
              ┌────────▼────────┐   ┌─────────▼──────────┐
              │   PostgreSQL    │   │       Redis          │
              │  (bitnami chart)│   │  (bitnami chart)    │
              └─────────────────┘   └────────────────────┘

Resource sizing guide¶

Role	Replicas	CPU req	Mem req	Notes
server	1	100m	256Mi	Stateful - single replica
orchestrator	1	100m	128Mi	Reads backlog, no heavy compute
worker	2-20	500m	512Mi	Scale based on task throughput

Workers make outbound calls to LLM APIs and run claude/codex/gemini CLI binaries. They do not need GPUs.

Secrets management¶

Never put API keys in values.yaml. Use one of:

Kubernetes Secrets (kubectl create secret) - simplest
External Secrets Operator - sync from AWS Secrets Manager, Vault, GCP Secret Manager
Sealed Secrets - encrypted secrets committed to git

Health checks¶

# Task server health
kubectl exec -n bernstein deploy/bernstein-server -- \
  curl -s http://localhost:8052/health

# Live task queue
kubectl exec -n bernstein deploy/bernstein-server -- \
  curl -s http://localhost:8052/status

Raw Kubernetes manifests (without Helm)¶

# bernstein.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: bernstein
  namespace: bernstein
spec:
  replicas: 1
  selector:
    matchLabels:
      app: bernstein
  template:
    metadata:
      labels:
        app: bernstein
    spec:
      containers:
        - name: bernstein
          image: bernstein:latest
          ports:
            - containerPort: 8052
              name: http
            - containerPort: 9090
              name: metrics
          env:
            - name: BERNSTEIN_BIND_HOST
              value: "0.0.0.0"
            - name: BERNSTEIN_CLUSTER_ENABLED
              value: "true"
            - name: ANTHROPIC_API_KEY
              valueFrom:
                secretKeyRef:
                  name: bernstein-provider-keys
                  key: ANTHROPIC_API_KEY
          volumeMounts:
            - name: state
              mountPath: /workspace/.sdd
          livenessProbe:
            httpGet:
              path: /health/live
              port: http
            initialDelaySeconds: 10
            periodSeconds: 30
          readinessProbe:
            httpGet:
              path: /health/ready
              port: http
            initialDelaySeconds: 5
            periodSeconds: 10
          resources:
            requests:
              memory: "512Mi"
              cpu: "250m"
            limits:
              memory: "4Gi"
              cpu: "2000m"
      volumes:
        - name: state
          persistentVolumeClaim:
            claimName: bernstein-state
---
apiVersion: v1
kind: Service
metadata:
  name: bernstein
  namespace: bernstein
spec:
  selector:
    app: bernstein
  ports:
    - port: 8052
      targetPort: http
      name: http
    - port: 9090
      targetPort: metrics
      name: metrics
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: bernstein-state
  namespace: bernstein
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

kubectl apply -f bernstein.yaml
kubectl get pods -n bernstein

Cloudflare cloud deployment¶

Bernstein can execute agents on Cloudflare's edge infrastructure instead of local processes. This is useful for teams that want centralized billing, isolated sandboxes for untrusted code, or global low-latency agent dispatch.

Quick start¶

# 1. Install wrangler
npm install -g wrangler
wrangler login

# 2. Deploy the agent Worker
bernstein cloud deploy --worker-name bernstein-agent

# 3. Authenticate with Bernstein Cloud
bernstein cloud login

# 4. Run orchestration in the cloud
bernstein cloud run "Add OAuth2 authentication" --max-agents 5 --budget 25.00

What runs where¶

Component	Location	Purpose
Orchestrator	Local or your server	Deterministic tick loop, task scheduling
Agent execution	Cloudflare Workers / Sandboxes	Code generation, testing
Workspace files	Cloudflare R2	File sync between local and cloud agents
Analytics / billing	Cloudflare D1	Usage metering, quota enforcement
LLM response cache	Cloudflare Vectorize	Semantic prompt deduplication
Internal LLM (planning)	Cloudflare Workers AI	Free-tier models for task decomposition

Environment variables for Cloudflare¶

Variable	Description
`CLOUDFLARE_ACCOUNT_ID`	Cloudflare account identifier
`CLOUDFLARE_API_TOKEN`	API token with Workers, R2, D1, Vectorize permissions
`BERNSTEIN_CLOUD_API_KEY`	API key for bernstein.run hosted service

For the full Cloudflare setup guide including R2 buckets, D1 databases, and Vectorize indexes, see the Cloudflare Setup documentation.

Team shared server¶

Running Bernstein on a dedicated server that multiple developers share. Each developer points their local tools at the shared task server.

Server setup (systemd)¶

# Create a system user
sudo useradd -r -s /bin/bash -d /opt/bernstein bernstein
sudo mkdir -p /opt/bernstein/workspace /var/lib/bernstein/.sdd
sudo chown -R bernstein:bernstein /opt/bernstein /var/lib/bernstein

# Install into a virtualenv
sudo -u bernstein python3.12 -m venv /opt/bernstein/venv
sudo -u bernstein /opt/bernstein/venv/bin/pip install bernstein

# Install a CLI agent (system-wide or in the venv)
npm install -g @anthropic-ai/claude-code

# /etc/systemd/system/bernstein.service
[Unit]
Description=Bernstein Orchestrator
After=network.target
Wants=network.target

[Service]
Type=simple
User=bernstein
Group=bernstein
WorkingDirectory=/opt/bernstein/workspace

ExecStart=/opt/bernstein/venv/bin/bernstein start
ExecStop=/opt/bernstein/venv/bin/bernstein stop --hard

Restart=on-failure
RestartSec=10

# Secrets - set via EnvironmentFile in production
EnvironmentFile=/etc/bernstein/env
Environment=BERNSTEIN_SDD_DIR=/var/lib/bernstein/.sdd
Environment=BERNSTEIN_BIND_HOST=0.0.0.0
Environment=BERNSTEIN_PORT=8052
Environment=BERNSTEIN_LOG_JSON=true
Environment=BERNSTEIN_MAX_AGENTS=8

# Security hardening
NoNewPrivileges=true
ProtectSystem=strict
PrivateTmp=true
ReadWritePaths=/var/lib/bernstein /opt/bernstein/workspace

[Install]
WantedBy=multi-user.target

# /etc/bernstein/env  (mode 0600, owned by bernstein)
ANTHROPIC_API_KEY=sk-ant-...
BERNSTEIN_AUTH_TOKEN=strong-random-secret
BERNSTEIN_DASHBOARD_PASSWORD=another-strong-password

sudo systemctl daemon-reload
sudo systemctl enable bernstein
sudo systemctl start bernstein
sudo journalctl -u bernstein -f

Connecting as a team member¶

Each developer configures their local bernstein.yaml to point at the shared server:

# bernstein.yaml (developer's local project)
server_url: http://bernstein.internal:8052
# or via env: BERNSTEIN_SERVER_URL=http://bernstein.internal:8052

# Submit tasks to the shared server without running a local orchestrator
bernstein add-task "Implement login page" --role frontend --priority 2
bernstein status   # see what the shared server is running
bernstein ps       # list active agents

Reverse proxy (nginx)¶

Expose the dashboard behind TLS:

# /etc/nginx/sites-enabled/bernstein
server {
    listen 443 ssl;
    server_name bernstein.internal;

    ssl_certificate     /etc/ssl/certs/bernstein.crt;
    ssl_certificate_key /etc/ssl/private/bernstein.key;

    # Dashboard
    location / {
        proxy_pass http://127.0.0.1:8052;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto https;

        # Required for SSE (live dashboard streaming)
        proxy_buffering off;
        proxy_cache off;
        proxy_read_timeout 600s;
    }
}

Caddy alternative (automatic HTTPS):

# /etc/caddy/Caddyfile
bernstein.internal {
    reverse_proxy 127.0.0.1:8052
}

Caddy automatically obtains and renews a Let's Encrypt certificate: systemctl start caddy.

Once TLS is in place, point remote workers at the https:// URL and pair the bearer token with TLS so it is never transmitted in the clear:

BERNSTEIN_SERVER_URL=https://bernstein.internal \
  BERNSTEIN_AUTH_TOKEN=<secret> \
  bernstein worker

Multi-project setup¶

Run separate orchestrators on different ports for project isolation:

# /etc/systemd/system/bernstein@.service  (template unit)
[Unit]
Description=Bernstein Orchestrator - %i
After=network.target

[Service]
Type=simple
User=bernstein
WorkingDirectory=/opt/bernstein/projects/%i
EnvironmentFile=/etc/bernstein/%i.env
ExecStart=/opt/bernstein/venv/bin/bernstein start

[Install]
WantedBy=multi-user.target

# Start project-a on port 8052 and project-b on port 8053
sudo systemctl start bernstein@project-a
sudo systemctl start bernstein@project-b

Environment variables¶

Variable	Default	Description
`ANTHROPIC_API_KEY`	-	Claude API key
`OPENAI_API_KEY`	-	OpenAI / Codex API key
`GOOGLE_API_KEY`	-	Gemini API key
`BERNSTEIN_SERVER_URL`	`http://127.0.0.1:8052`	Task server URL (for remote workers)
`BERNSTEIN_BIND_HOST`	`127.0.0.1`	Server bind address
`BERNSTEIN_PORT`	`8052`	Server port
`BERNSTEIN_MAX_AGENTS`	`6`	Max concurrent agents
`BERNSTEIN_AUTH_TOKEN`	-	Inter-node auth secret (cluster mode)
`BERNSTEIN_DASHBOARD_PASSWORD`	-	Dashboard HTTP auth password
`BERNSTEIN_STORAGE_BACKEND`	`memory`	`memory`, `postgres`, or `redis`
`BERNSTEIN_DATABASE_URL`	-	PostgreSQL DSN (e.g. `postgresql://user:pass@host/db`)
`BERNSTEIN_REDIS_URL`	-	Redis URL (e.g. `redis://localhost:6379/0`)
`BERNSTEIN_CLUSTER_ENABLED`	`false`	Enable multi-node cluster mode
`BERNSTEIN_LOG_LEVEL`	`INFO`	Log verbosity (`DEBUG`/`INFO`/`WARNING`/`ERROR`)
`BERNSTEIN_LOG_JSON`	`false`	Emit JSON log lines (for log aggregators)
`BERNSTEIN_BUDGET`	-	Hard spending cap in USD
`BERNSTEIN_SKIP_GATES`	-	Skip quality gates (requires `BERNSTEIN_SKIP_GATE_REASON`)
`BERNSTEIN_NO_TUI`	-	Disable interactive TUI (useful in CI)
`BERNSTEIN_QUIET`	-	Suppress all non-error output

Zero-downtime upgrades (blue-green)¶

Bernstein supports blue-green deployments to upgrade the server without dropping in-flight tasks. The mechanism swaps the .sdd/ symlink between two parallel state directories (.sdd-blue/ and .sdd-green/), letting the new version warm up before traffic switches.

How it works¶

.sdd/  →  .sdd-blue/    ← current live state
          .sdd-green/   ← new version (being prepared)

On switch_traffic(), the .sdd/ symlink is atomically re-pointed at .sdd-green/. If the health check fails, rollback() re-points it back to .sdd-blue/.

Python API¶

from pathlib import Path
from bernstein.core.orchestration.blue_green import BlueGreenConfig, BlueGreenDeployment

cfg = BlueGreenConfig(
    health_check_url="http://127.0.0.1:8052/status",
    rollback_on_error=True,
    switch_delay_seconds=10,
)
deploy = BlueGreenDeployment(cfg, base_dir=Path("."))

# 1. Prepare the green environment with the new version
green_path = deploy.prepare_green("2.1.0")

# 2. Start the new server process pointing at green_path
# ... start bernstein with BERNSTEIN_SDD_DIR=green_path ...

# 3. Check health
if deploy.health_check():
    deploy.switch_traffic()   # symlink: .sdd/ → .sdd-green/
else:
    deploy.rollback()         # stays on blue; green is discarded

Upgrade procedure (bare metal)¶

# 1. Install the new version alongside the old
pip install bernstein==2.1.0 --target /opt/bernstein/v2.1.0

# 2. Start the new server on a staging port
BERNSTEIN_PORT=8053 BERNSTEIN_SDD_DIR=.sdd-green \
  /opt/bernstein/v2.1.0/bin/bernstein start &

# 3. Verify it is healthy
curl http://127.0.0.1:8053/status

# 4. Switch traffic via the Python API or CLI
python3 -c "
from pathlib import Path
from bernstein.core.orchestration.blue_green import BlueGreenConfig, BlueGreenDeployment
cfg = BlueGreenConfig(health_check_url='http://127.0.0.1:8053/status')
BlueGreenDeployment(cfg, Path('.')).switch_traffic()
"

# 5. Stop the old server
kill $(cat .sdd-blue/runtime/server.pid)

Check deployment status¶

status = deploy.status()
print(status.active)         # "blue" or "green"
print(status.blue_version)   # "2.0.0"
print(status.green_version)  # "2.1.0"
print(status.healthy)        # True / False

Upgrading¶

Stop the running instance: bernstein stop
Back up state: cp -r .sdd .sdd.backup-$(date +%Y%m%d)
Install the new version: pip install --upgrade bernstein
Start: bernstein run

State format is forward-compatible between minor versions. For major version upgrades, check operations/migrations.md for breaking changes.

To roll back: pip install bernstein==<previous-version> and restore .sdd.backup/.

For zero-downtime upgrades on production servers, use the blue-green procedure above.

Troubleshooting deployments¶

Task server health check fails on startup¶

The server may be waiting for PostgreSQL or Redis to be ready. Check dependencies first:

# Docker Compose
docker compose logs postgres
docker compose logs redis

# Kubernetes
kubectl logs -n bernstein -l app.kubernetes.io/component=postgresql
kubectl get events -n bernstein --sort-by='.lastTimestamp'

If the server crashes immediately, check the server log directly:

# Local
cat .sdd/runtime/logs/server.log

# Docker
docker logs bernstein-server

# Kubernetes
kubectl logs -n bernstein deploy/bernstein-server

Workers are not claiming tasks¶

Check 1: Auth token mismatch. Every node must share the same BERNSTEIN_AUTH_TOKEN:

# Docker Compose - inspect worker env
docker compose exec bernstein-worker env | grep AUTH_TOKEN

# Kubernetes - decode the secret
kubectl get secret bernstein-auth -n bernstein -o jsonpath='{.data.BERNSTEIN_AUTH_TOKEN}' | base64 -d

Check 2: Worker cannot reach the task server. Verify the BERNSTEIN_SERVER_URL is correct and reachable from the worker:

# From inside the worker container
docker compose exec bernstein-worker curl -s http://bernstein-server:8052/health

# Kubernetes
kubectl exec -n bernstein deploy/bernstein-worker -- curl -s http://bernstein-server:8052/health

Check 3: No open tasks. If the backlog is empty, workers have nothing to do:

curl http://localhost:8052/tasks?status=open

Port 8052 is already in use¶

A previous Bernstein session did not shut down cleanly. Find and stop it:

# Local - use Bernstein's own stop command
bernstein stop --force

# Or find the PID manually
cat .sdd/runtime/pids/server.json
kill <pid>

# Or kill by port
lsof -ti:8052 | xargs kill -9

Agents spawn but exit immediately¶

Agents exit when they have no work or cannot authenticate. Check logs:

bernstein logs tail -f                  # follow all agent output
bernstein logs tail -a claude           # filter by agent name
tail -f .sdd/runtime/logs/*.log         # raw log files

Common causes:

Symptom	Likely cause	Fix
`AuthenticationError` in log	API key missing or expired	Re-export `ANTHROPIC_API_KEY` etc.
Agent exits with code 1 immediately	CLI not authenticated	Run `claude login` / `codex login`
`Connection refused` to task server	Server not started	Check `bernstein status`
Agent claims task then fails it	Task prompt too long	Reduce `scope` in task config

Tasks stuck in "claimed" status¶

An agent crashed before reporting completion. The task stays claimed until the janitor reclaims it (default: 5 minutes) or you force a reset:

# Auto-fix stale locks
bernstein doctor --fix

# Or restart cleanly
bernstein stop && bernstein

Stale claimed tasks appear in bernstein status with a "claimed for >5m" annotation.

Docker volume permissions¶

If the server cannot write to .sdd/, the named volume may be owned by root:

docker compose exec bernstein-server ls -la /workspace/.sdd
# If root-owned:
docker compose exec --user root bernstein-server chown -R bernstein:bernstein /workspace/.sdd

Kubernetes pod stuck in `Pending`¶

Usually a resource or PersistentVolumeClaim issue:

kubectl describe pod -n bernstein -l app.kubernetes.io/name=bernstein
kubectl get pvc -n bernstein

If PVC is in Pending, your cluster may not have a default StorageClass:

kubectl get storageclass
# Set one as default if none exists:
kubectl patch storageclass <name> -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Grafana shows no data¶

Check that Prometheus is scraping the task server:

# Docker Compose - open Prometheus targets page
open http://localhost:9090/targets

# Kubernetes
kubectl port-forward -n bernstein svc/prometheus 9090:9090 &
open http://localhost:9090/targets

If bernstein-server shows as DOWN, the metrics endpoint is not reachable. Verify the server is running and the Prometheus scrape config points to the correct host and port.

Still stuck?¶

Run bernstein doctor - it checks the most common issues automatically.
Check the Troubleshooting guide for agent-level issues (API errors, quality gate failures, cost overruns).
Open an issue at sipyourdrink-ltd/bernstein with the output of bernstein doctor --json.

Deployment Guide¶

Local development¶

Prerequisites¶

Install¶

First run¶

Local configuration file¶

Verify the install¶

Smoke test¶

CI/CD - GitHub Actions¶

Single-shot plan execution¶

Running Bernstein from a workflow¶

Storing secrets¶

Verify CI output¶

CI/CD - GitLab CI¶

Basic pipeline stage¶

Caching pip packages between runs¶

Multi-project pipeline (scheduled)¶

Protected variables (GitLab secrets)¶

Docker single-host¶

.env file¶

Dockerfile¶

Docker Compose cluster¶

Setup¶

Backing up state¶

Service endpoints¶

Stopping and cleanup¶

Verify the cluster is healthy¶

Kubernetes / Helm¶

Prerequisites¶

Install with Helm¶

Provider API keys (Kubernetes secret)¶

values.yaml - complete example¶

Common overrides¶

Architecture¶

Resource sizing guide¶

Secrets management¶

Health checks¶

Raw Kubernetes manifests (without Helm)¶

Cloudflare cloud deployment¶

Quick start¶

What runs where¶

Environment variables for Cloudflare¶

Team shared server¶

Server setup (systemd)¶

Connecting as a team member¶

Reverse proxy (nginx)¶

Multi-project setup¶

Environment variables¶

Zero-downtime upgrades (blue-green)¶

How it works¶

Python API¶

Upgrade procedure (bare metal)¶

Check deployment status¶

Upgrading¶

Troubleshooting deployments¶

Task server health check fails on startup¶

Workers are not claiming tasks¶

Port 8052 is already in use¶

Agents spawn but exit immediately¶

Tasks stuck in "claimed" status¶

Docker volume permissions¶

Kubernetes pod stuck in Pending¶

Grafana shows no data¶

Still stuck?¶

`values.yaml` - complete example¶

Kubernetes pod stuck in `Pending`¶