Deployment Guide¶
How to run Bernstein in different environments. Each section is self-contained with complete configuration examples.
Jump to: - Local development - CI/CD — GitHub Actions - CI/CD — GitLab CI - Docker single-host - Docker Compose cluster - Kubernetes / Helm - Cloudflare cloud deployment - Team shared server (bare metal) - Environment variable reference - Zero-downtime upgrades (blue-green) - Upgrading - Troubleshooting deployments
New to Bernstein? Complete the Quickstart Tutorial first to see orchestration in action before deploying.
Local development¶
Prerequisites¶
- Python 3.12+
- Git
uv(recommended) or pip- At least one supported CLI agent installed (Claude Code, Codex, Gemini CLI, etc.)
- An API key for at least one LLM provider
Install¶
# With uv (recommended — faster, handles virtualenvs automatically)
uv pip install bernstein
# Or install from source
git clone https://github.com/bernstein-ai/bernstein
cd bernstein
uv pip install -e .
# Or with pip
pip install bernstein
First run¶
# Set your API key
export ANTHROPIC_API_KEY="sk-ant-..." # Claude
# export OPENAI_API_KEY="sk-..." # GPT / Codex
# export GOOGLE_API_KEY="..." # Gemini
# Initialize a project
cd /path/to/your/project
bernstein init
# Run a plan
bernstein run plans/my-project.yaml
# Or run interactively (type a goal, Bernstein decomposes it)
bernstein run
The task server starts on http://127.0.0.1:8052. State is stored in .sdd/ in the working directory. Add .sdd/ to .gitignore.
Local configuration file¶
Create bernstein.yaml in your project root:
# bernstein.yaml
cli: auto # auto-detect installed agent (claude|codex|gemini|qwen)
model: sonnet # default model
max_agents: 4 # concurrent agents; tune based on your API tier
budget: 5.00 # hard spending cap in USD (optional)
# Override model per role
role_model_policy:
docs:
model: haiku
backend:
model: sonnet
architect:
model: opus
# Share context files with all agents
context_files:
- README.md
- docs/architecture/ARCHITECTURE.md
Verify the install¶
bernstein doctor # checks dependencies, API keys, git setup
bernstein status # shows task server state
Smoke test¶
Run the zero-config demo to confirm everything works end-to-end before using it on real code:
Expected: all 3 tasks complete and a summary table prints with elapsed time and cost.
CI/CD — GitHub Actions¶
Single-shot plan execution¶
Run a plan file on every push or on demand:
# .github/workflows/bernstein.yml
name: Bernstein Agent Run
on:
workflow_dispatch:
inputs:
plan:
description: "Plan file (relative to repo root)"
required: true
default: "plans/ci-tasks.yaml"
max_agents:
description: "Max concurrent agents"
required: false
default: "2"
jobs:
bernstein:
runs-on: ubuntu-latest
timeout-minutes: 60
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # full history needed for worktree operations
- name: Set up Python 3.12
uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: "pip"
- name: Install Bernstein
run: pip install bernstein
- name: Install Claude Code (or your preferred CLI agent)
run: npm install -g @anthropic-ai/claude-code
# Alternatively:
# run: pip install openai-codex
- name: Run plan
run: |
bernstein run ${{ inputs.plan }} \
--max-agents ${{ inputs.max_agents }} \
--budget 10.00
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
# OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
BERNSTEIN_LOG_JSON: "true"
BERNSTEIN_NO_TUI: "true" # disable interactive TUI in CI
- name: Upload state artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: bernstein-state-${{ github.run_id }}
path: .sdd/
retention-days: 7
- name: Post summary
if: always()
run: bernstein report --format markdown >> $GITHUB_STEP_SUMMARY
Using the official GitHub Action¶
If the Bernstein GitHub Action is available in the marketplace:
# .github/workflows/bernstein-action.yml
name: Bernstein (Action)
on:
push:
branches: [main]
jobs:
run:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: bernstein-ai/bernstein-action@v1
with:
plan: plans/ci-tasks.yaml
max-agents: 2
budget: 5.00
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
See docs/integrations/github-action.md for the full parameter reference.
Storing secrets¶
# Add secrets to your repository
gh secret set ANTHROPIC_API_KEY --body "sk-ant-..."
gh secret set OPENAI_API_KEY --body "sk-..."
Never commit API keys to your repository. Use GitHub Secrets for all provider credentials.
Verify CI output¶
A successful run uploads the .sdd/ state artifact and posts a markdown summary. In the Actions UI you should see:
- A green
bernsteinjob .sdd/uploaded under Artifacts- A step summary with task counts, duration, and cost
If the job fails with a non-zero exit, check the "Run plan" step logs. The most common causes are missing secrets and expired API keys.
CI/CD — GitLab CI¶
Basic pipeline stage¶
# .gitlab-ci.yml
bernstein:
image: python:3.12-slim
stage: build
timeout: 60 minutes
before_script:
- apt-get update -q && apt-get install -y -q git npm
- pip install bernstein
- npm install -g @anthropic-ai/claude-code
script:
- bernstein run plans/ci-tasks.yaml --max-agents 2 --budget 10.00
variables:
ANTHROPIC_API_KEY: $ANTHROPIC_API_KEY # set in GitLab CI/CD settings
BERNSTEIN_LOG_JSON: "true"
BERNSTEIN_NO_TUI: "true" # disable interactive TUI in CI
artifacts:
paths:
- .sdd/
expire_in: 7 days
when: always
Caching pip packages between runs¶
bernstein:
image: python:3.12-slim
stage: build
cache:
key: bernstein-pip-$CI_COMMIT_REF_SLUG
paths:
- .pip-cache/
before_script:
- apt-get update -q && apt-get install -y -q git npm
- pip install --cache-dir .pip-cache bernstein
- npm install -g @anthropic-ai/claude-code
script:
- bernstein run plans/ci-tasks.yaml --max-agents 2
variables:
ANTHROPIC_API_KEY: $ANTHROPIC_API_KEY
BERNSTEIN_NO_TUI: "true"
PIP_CACHE_DIR: "$CI_PROJECT_DIR/.pip-cache"
Multi-project pipeline (scheduled)¶
# .gitlab-ci.yml — runs nightly
weekly-refactor:
image: python:3.12-slim
stage: maintenance
rules:
- if: $CI_PIPELINE_SOURCE == "schedule"
script:
- pip install bernstein
- bernstein run plans/weekly-maintenance.yaml --max-agents 3
variables:
ANTHROPIC_API_KEY: $ANTHROPIC_API_KEY
Protected variables (GitLab secrets)¶
- Go to Settings → CI/CD → Variables.
- Add
ANTHROPIC_API_KEYwith Protected and Masked flags enabled. - The variable is available in protected branches and tags only.
Docker single-host¶
.env file¶
Copy the example and fill in your keys:
# .env (never commit this file)
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...
BERNSTEIN_AUTH_TOKEN=change-me-in-production
BERNSTEIN_MAX_AGENTS=4
BERNSTEIN_DASHBOARD_PASSWORD=change-me
Dockerfile¶
FROM python:3.12-slim
WORKDIR /workspace
# System dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
curl \
npm \
&& rm -rf /var/lib/apt/lists/*
# Install a CLI agent
RUN npm install -g @anthropic-ai/claude-code
# Install Bernstein
RUN pip install --no-cache-dir bernstein
# Non-root user for security
RUN useradd -m -u 1000 bernstein && chown -R bernstein /workspace
USER bernstein
# State directory
VOLUME ["/workspace/.sdd"]
ENV BERNSTEIN_BIND_HOST=0.0.0.0
ENV BERNSTEIN_PORT=8052
ENV BERNSTEIN_NO_TUI=true
EXPOSE 8052
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
CMD curl -f http://localhost:8052/health || exit 1
CMD ["bernstein", "conduct"]
docker build -t bernstein:latest .
docker run -d \
--name bernstein \
--env-file .env \
-p 8052:8052 \
-v bernstein-state:/workspace/.sdd \
-v $(pwd):/workspace/project \
bernstein:latest
Docker Compose cluster¶
The included docker-compose.yaml runs a full cluster: task server, orchestrator, scalable workers, PostgreSQL, Redis, Prometheus, and Grafana.
Setup¶
# Copy and edit the env file
cp .env.example .env
# Edit .env: set ANTHROPIC_API_KEY and BERNSTEIN_AUTH_TOKEN
# Start the full stack
docker compose up -d
# Scale workers (each worker claims tasks from the shared server)
docker compose up -d --scale bernstein-worker=4
# View logs
docker compose logs -f bernstein-server
docker compose logs -f bernstein-orchestrator
Service endpoints¶
| Service | URL | Purpose |
|---|---|---|
| Task server + dashboard | http://localhost:8052/dashboard | Web UI, task management |
| Task server API | http://localhost:8052 | REST API |
| Prometheus | http://localhost:9090 | Metrics |
| Grafana | http://localhost:3000 | Agent dashboards (admin/admin) |
Stopping and cleanup¶
docker compose down # stop containers, keep volumes
docker compose down -v # stop and delete all data volumes
Verify the cluster is healthy¶
# All containers should be "Up"
docker compose ps
# Task server health endpoint
curl http://localhost:8052/health
# Check the dashboard
open http://localhost:8052/dashboard
Expected health response:
Kubernetes / Helm¶
Prerequisites¶
- Kubernetes 1.24+
- Helm 3.x
kubectlconfigured for your cluster- Persistent storage (EBS, NFS, local-path provisioner, etc.)
Install with Helm¶
# From the local chart
helm install bernstein ./deploy/helm/bernstein \
--namespace bernstein \
--create-namespace \
-f my-values.yaml
# Or add the Helm repo (when published)
helm repo add bernstein https://charts.bernstein.dev
helm repo update
helm install bernstein bernstein/bernstein \
--namespace bernstein \
--create-namespace
Provider API keys (Kubernetes secret)¶
kubectl create secret generic bernstein-provider-keys \
--namespace bernstein \
--from-literal=ANTHROPIC_API_KEY="sk-ant-..." \
--from-literal=OPENAI_API_KEY="sk-..."
values.yaml — complete example¶
# my-values.yaml
image:
repository: bernstein
tag: latest
pullPolicy: IfNotPresent
server:
replicaCount: 1
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "4Gi"
cpu: "2000m"
persistence:
enabled: true
storageClass: "" # use cluster default
size: 10Gi
service:
type: ClusterIP
port: 8052
worker:
replicaCount: 2
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "8Gi"
cpu: "4000m"
persistence:
enabled: true
storageClass: ""
size: 20Gi # larger: holds git worktrees
autoscaling:
enabled: true
minReplicas: 1
maxReplicas: 20
targetQueueDepth: "2" # scale up when 2+ tasks per worker
targetCPUUtilizationPercentage: 70
providerKeys:
existingSecret: bernstein-provider-keys
auth:
enabled: true
# existingSecret: my-auth-secret # use an existing secret instead
config:
maxAgents: 6
logLevel: INFO
clusterEnabled: true
monitoring:
prometheus:
enabled: true
grafana:
enabled: true
adminPassword: "change-me"
helm install bernstein ./deploy/helm/bernstein \
--namespace bernstein \
--create-namespace \
-f my-values.yaml
# Verify
kubectl get pods -n bernstein
kubectl port-forward -n bernstein svc/bernstein-server 8052:8052
For the full Helm chart parameter reference, see docs/operations/HELM_DEPLOYMENT.md.
Raw Kubernetes manifests (without Helm)¶
# bernstein.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: bernstein
namespace: bernstein
spec:
replicas: 1
selector:
matchLabels:
app: bernstein
template:
metadata:
labels:
app: bernstein
spec:
containers:
- name: bernstein
image: bernstein:latest
ports:
- containerPort: 8052
name: http
- containerPort: 9090
name: metrics
env:
- name: BERNSTEIN_BIND_HOST
value: "0.0.0.0"
- name: BERNSTEIN_CLUSTER_ENABLED
value: "true"
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
name: bernstein-provider-keys
key: ANTHROPIC_API_KEY
volumeMounts:
- name: state
mountPath: /workspace/.sdd
livenessProbe:
httpGet:
path: /health/live
port: http
initialDelaySeconds: 10
periodSeconds: 30
readinessProbe:
httpGet:
path: /health/ready
port: http
initialDelaySeconds: 5
periodSeconds: 10
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "4Gi"
cpu: "2000m"
volumes:
- name: state
persistentVolumeClaim:
claimName: bernstein-state
---
apiVersion: v1
kind: Service
metadata:
name: bernstein
namespace: bernstein
spec:
selector:
app: bernstein
ports:
- port: 8052
targetPort: http
name: http
- port: 9090
targetPort: metrics
name: metrics
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: bernstein-state
namespace: bernstein
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Cloudflare cloud deployment¶
Bernstein can execute agents on Cloudflare's edge infrastructure instead of local processes. This is useful for teams that want centralized billing, isolated sandboxes for untrusted code, or global low-latency agent dispatch.
Quick start¶
# 1. Install wrangler
npm install -g wrangler
wrangler login
# 2. Deploy the agent Worker
bernstein cloud deploy --worker-name bernstein-agent
# 3. Authenticate with Bernstein Cloud
bernstein cloud login
# 4. Run orchestration in the cloud
bernstein cloud run "Add OAuth2 authentication" --max-agents 5 --budget 25.00
What runs where¶
| Component | Location | Purpose |
|---|---|---|
| Orchestrator | Local or your server | Deterministic tick loop, task scheduling |
| Agent execution | Cloudflare Workers / Sandboxes | Code generation, testing |
| Workspace files | Cloudflare R2 | File sync between local and cloud agents |
| Analytics / billing | Cloudflare D1 | Usage metering, quota enforcement |
| LLM response cache | Cloudflare Vectorize | Semantic prompt deduplication |
| Internal LLM (planning) | Cloudflare Workers AI | Free-tier models for task decomposition |
Environment variables for Cloudflare¶
| Variable | Description |
|---|---|
CLOUDFLARE_ACCOUNT_ID | Cloudflare account identifier |
CLOUDFLARE_API_TOKEN | API token with Workers, R2, D1, Vectorize permissions |
BERNSTEIN_CLOUD_API_KEY | API key for bernstein.run hosted service |
For the full Cloudflare setup guide including R2 buckets, D1 databases, and Vectorize indexes, see the Cloudflare Setup documentation.
Team shared server¶
Running Bernstein on a dedicated server that multiple developers share. Each developer points their local tools at the shared task server.
Server setup (systemd)¶
# Create a system user
sudo useradd -r -s /bin/bash -d /opt/bernstein bernstein
sudo mkdir -p /opt/bernstein/workspace /var/lib/bernstein/.sdd
sudo chown -R bernstein:bernstein /opt/bernstein /var/lib/bernstein
# Install into a virtualenv
sudo -u bernstein python3.12 -m venv /opt/bernstein/venv
sudo -u bernstein /opt/bernstein/venv/bin/pip install bernstein
# Install a CLI agent (system-wide or in the venv)
npm install -g @anthropic-ai/claude-code
# /etc/systemd/system/bernstein.service
[Unit]
Description=Bernstein Orchestrator
After=network.target
Wants=network.target
[Service]
Type=simple
User=bernstein
Group=bernstein
WorkingDirectory=/opt/bernstein/workspace
ExecStart=/opt/bernstein/venv/bin/bernstein conduct
ExecStop=/opt/bernstein/venv/bin/bernstein stop --hard
Restart=on-failure
RestartSec=10
# Secrets — set via EnvironmentFile in production
EnvironmentFile=/etc/bernstein/env
Environment=BERNSTEIN_SDD_DIR=/var/lib/bernstein/.sdd
Environment=BERNSTEIN_BIND_HOST=0.0.0.0
Environment=BERNSTEIN_PORT=8052
Environment=BERNSTEIN_LOG_JSON=true
Environment=BERNSTEIN_MAX_AGENTS=8
# Security hardening
NoNewPrivileges=true
ProtectSystem=strict
PrivateTmp=true
ReadWritePaths=/var/lib/bernstein /opt/bernstein/workspace
[Install]
WantedBy=multi-user.target
# /etc/bernstein/env (mode 0600, owned by bernstein)
ANTHROPIC_API_KEY=sk-ant-...
BERNSTEIN_AUTH_TOKEN=strong-random-secret
BERNSTEIN_DASHBOARD_PASSWORD=another-strong-password
sudo systemctl daemon-reload
sudo systemctl enable bernstein
sudo systemctl start bernstein
sudo journalctl -u bernstein -f
Connecting as a team member¶
Each developer configures their local bernstein.yaml to point at the shared server:
# bernstein.yaml (developer's local project)
server_url: http://bernstein.internal:8052
# or via env: BERNSTEIN_SERVER_URL=http://bernstein.internal:8052
# Submit tasks to the shared server without running a local orchestrator
bernstein task add "Implement login page" --role frontend --priority 2
bernstein status # see what the shared server is running
bernstein ps # list active agents
Reverse proxy (nginx)¶
Expose the dashboard behind TLS:
# /etc/nginx/sites-enabled/bernstein
server {
listen 443 ssl;
server_name bernstein.internal;
ssl_certificate /etc/ssl/certs/bernstein.crt;
ssl_certificate_key /etc/ssl/private/bernstein.key;
# Dashboard
location / {
proxy_pass http://127.0.0.1:8052;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto https;
# Required for SSE (live dashboard streaming)
proxy_buffering off;
proxy_cache off;
proxy_read_timeout 600s;
}
}
Multi-project setup¶
Run separate orchestrators on different ports for project isolation:
# /etc/systemd/system/bernstein@.service (template unit)
[Unit]
Description=Bernstein Orchestrator — %i
After=network.target
[Service]
Type=simple
User=bernstein
WorkingDirectory=/opt/bernstein/projects/%i
EnvironmentFile=/etc/bernstein/%i.env
ExecStart=/opt/bernstein/venv/bin/bernstein conduct
[Install]
WantedBy=multi-user.target
# Start project-a on port 8052 and project-b on port 8053
sudo systemctl start bernstein@project-a
sudo systemctl start bernstein@project-b
Environment variables¶
| Variable | Default | Description |
|---|---|---|
ANTHROPIC_API_KEY | — | Claude API key |
OPENAI_API_KEY | — | OpenAI / Codex API key |
GOOGLE_API_KEY | — | Gemini API key |
BERNSTEIN_SERVER_URL | http://127.0.0.1:8052 | Task server URL (for remote workers) |
BERNSTEIN_BIND_HOST | 127.0.0.1 | Server bind address |
BERNSTEIN_PORT | 8052 | Server port |
BERNSTEIN_MAX_AGENTS | 6 | Max concurrent agents |
BERNSTEIN_AUTH_TOKEN | — | Inter-node auth secret (cluster mode) |
BERNSTEIN_DASHBOARD_PASSWORD | — | Dashboard HTTP auth password |
BERNSTEIN_STORAGE_BACKEND | memory | memory, postgres, or redis |
BERNSTEIN_DATABASE_URL | — | PostgreSQL DSN (e.g. postgresql://user:pass@host/db) |
BERNSTEIN_REDIS_URL | — | Redis URL (e.g. redis://localhost:6379/0) |
BERNSTEIN_CLUSTER_ENABLED | false | Enable multi-node cluster mode |
BERNSTEIN_LOG_LEVEL | INFO | Log verbosity (DEBUG/INFO/WARNING/ERROR) |
BERNSTEIN_LOG_JSON | false | Emit JSON log lines (for log aggregators) |
BERNSTEIN_BUDGET | — | Hard spending cap in USD |
BERNSTEIN_TICK_INTERVAL | 5 | Orchestrator tick interval in seconds |
BERNSTEIN_SKIP_GATES | — | Skip quality gates (requires BERNSTEIN_SKIP_GATE_REASON) |
BERNSTEIN_NO_TUI | — | Disable interactive TUI (useful in CI) |
BERNSTEIN_QUIET | — | Suppress all non-error output |
Zero-downtime upgrades (blue-green)¶
Bernstein supports blue-green deployments to upgrade the server without dropping in-flight tasks. The mechanism swaps the .sdd/ symlink between two parallel state directories (.sdd-blue/ and .sdd-green/), letting the new version warm up before traffic switches.
How it works¶
On switch_traffic(), the .sdd/ symlink is atomically re-pointed at .sdd-green/. If the health check fails, rollback() re-points it back to .sdd-blue/.
Python API¶
from pathlib import Path
from bernstein.core.blue_green import BlueGreenConfig, BlueGreenDeployment
cfg = BlueGreenConfig(
health_check_url="http://127.0.0.1:8052/status",
rollback_on_error=True,
switch_delay_seconds=10,
)
deploy = BlueGreenDeployment(cfg, base_dir=Path("."))
# 1. Prepare the green environment with the new version
green_path = deploy.prepare_green("2.1.0")
# 2. Start the new server process pointing at green_path
# ... start bernstein with BERNSTEIN_SDD_DIR=green_path ...
# 3. Check health
if deploy.health_check():
deploy.switch_traffic() # symlink: .sdd/ → .sdd-green/
else:
deploy.rollback() # stays on blue; green is discarded
Upgrade procedure (bare metal)¶
# 1. Install the new version alongside the old
pip install bernstein==2.1.0 --target /opt/bernstein/v2.1.0
# 2. Start the new server on a staging port
BERNSTEIN_PORT=8053 BERNSTEIN_SDD_DIR=.sdd-green \
/opt/bernstein/v2.1.0/bin/bernstein conduct &
# 3. Verify it is healthy
curl http://127.0.0.1:8053/status
# 4. Switch traffic via the Python API or CLI
python3 -c "
from pathlib import Path
from bernstein.core.blue_green import BlueGreenConfig, BlueGreenDeployment
cfg = BlueGreenConfig(health_check_url='http://127.0.0.1:8053/status')
BlueGreenDeployment(cfg, Path('.')).switch_traffic()
"
# 5. Stop the old server
kill $(cat .sdd-blue/runtime/server.pid)
Check deployment status¶
status = deploy.status()
print(status.active) # "blue" or "green"
print(status.blue_version) # "2.0.0"
print(status.green_version) # "2.1.0"
print(status.healthy) # True / False
Upgrading¶
- Stop the running instance:
bernstein stop - Back up state:
cp -r .sdd .sdd.backup-$(date +%Y%m%d) - Install the new version:
pip install --upgrade bernstein - Start:
bernstein run
State format is forward-compatible between minor versions. For major version upgrades, check docs/migrations/migration-guides.md for breaking changes.
To roll back: pip install bernstein==<previous-version> and restore .sdd.backup/.
For zero-downtime upgrades on production servers, use the blue-green procedure above.
Troubleshooting deployments¶
Task server health check fails on startup¶
The server may be waiting for PostgreSQL or Redis to be ready. Check dependencies first:
# Docker Compose
docker compose logs postgres
docker compose logs redis
# Kubernetes
kubectl logs -n bernstein -l app.kubernetes.io/component=postgresql
kubectl get events -n bernstein --sort-by='.lastTimestamp'
If the server crashes immediately, check the server log directly:
# Local
cat .sdd/runtime/logs/server.log
# Docker
docker logs bernstein-server
# Kubernetes
kubectl logs -n bernstein deploy/bernstein-server
Workers are not claiming tasks¶
Check 1: Auth token mismatch. Every node must share the same BERNSTEIN_AUTH_TOKEN:
# Docker Compose — inspect worker env
docker compose exec bernstein-worker env | grep AUTH_TOKEN
# Kubernetes — decode the secret
kubectl get secret bernstein-auth -n bernstein -o jsonpath='{.data.BERNSTEIN_AUTH_TOKEN}' | base64 -d
Check 2: Worker cannot reach the task server. Verify the BERNSTEIN_SERVER_URL is correct and reachable from the worker:
# From inside the worker container
docker compose exec bernstein-worker curl -s http://bernstein-server:8052/health
# Kubernetes
kubectl exec -n bernstein deploy/bernstein-worker -- curl -s http://bernstein-server:8052/health
Check 3: No open tasks. If the backlog is empty, workers have nothing to do:
Port 8052 is already in use¶
A previous Bernstein session did not shut down cleanly. Find and stop it:
# Local — use Bernstein's own stop command
bernstein stop --force
# Or find the PID manually
cat .sdd/runtime/pids/server.json
kill <pid>
# Or kill by port
lsof -ti:8052 | xargs kill -9
Agents spawn but exit immediately¶
Agents exit when they have no work or cannot authenticate. Check logs:
bernstein logs -f # follow all agent output
bernstein logs -a claude # filter by agent name
tail -f .sdd/runtime/logs/*.log # raw log files
Common causes:
| Symptom | Likely cause | Fix |
|---|---|---|
AuthenticationError in log | API key missing or expired | Re-export ANTHROPIC_API_KEY etc. |
| Agent exits with code 1 immediately | CLI not authenticated | Run claude login / codex login |
Connection refused to task server | Server not started | Check bernstein status |
| Agent claims task then fails it | Task prompt too long | Reduce scope in task config |
Tasks stuck in "claimed" status¶
An agent crashed before reporting completion. The task stays claimed until the janitor reclaims it (default: 5 minutes) or you force a reset:
Stale claimed tasks appear in bernstein status with a "claimed for >5m" annotation.
Docker volume permissions¶
If the server cannot write to .sdd/, the named volume may be owned by root:
docker compose exec bernstein-server ls -la /workspace/.sdd
# If root-owned:
docker compose exec --user root bernstein-server chown -R bernstein:bernstein /workspace/.sdd
Kubernetes pod stuck in Pending¶
Usually a resource or PersistentVolumeClaim issue:
If PVC is in Pending, your cluster may not have a default StorageClass:
kubectl get storageclass
# Set one as default if none exists:
kubectl patch storageclass <name> -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
Grafana shows no data¶
Check that Prometheus is scraping the task server:
# Docker Compose — open Prometheus targets page
open http://localhost:9090/targets
# Kubernetes
kubectl port-forward -n bernstein svc/prometheus 9090:9090 &
open http://localhost:9090/targets
If bernstein-server shows as DOWN, the metrics endpoint is not reachable. Verify the server is running and the Prometheus scrape config points to the correct host and port.
Still stuck?¶
- Run
bernstein doctor— it checks the most common issues automatically. - Check the Troubleshooting guide for agent-level issues (API errors, quality gate failures, cost overruns).
- Open an issue at sipyourdrink-ltd/bernstein with the output of
bernstein doctor --json.