Autofix¶

Audience: teams adopting Bernstein for self-driving CI repair on its own pull requests.

What: bernstein autofix is a long-running daemon that watches a configured set of GitHub repositories, finds failed CI runs on PRs that Bernstein itself opened, classifies the failure into a routing bucket, and dispatches a fresh deterministic Bernstein run scoped to the failing log. The repair commit lands on the same branch; humans don't move.

Why: When Bernstein opens 30 PRs/day, transient flakiness, formatter churn, and dependency lint becomes the long pole. Autofix closes the loop without you. The classifier is keyword-based and deterministic so every escalation is auditable. See src/bernstein/core/autofix/__init__.py:1-30.

Cross-link: Quality pipeline for the in-run gate semantics; autofix is the post-merge counterpart that handles failures CI catches after the orchestrator already let go.

What autofix does, end-to-end¶

Source: src/bernstein/core/autofix/dispatcher.py:1-34. Per dispatched attempt:

Tick every poll_interval_seconds (default 60 s). For each configured repo, list open PRs with currently-failing CI runs.
Ownership gate - read the PR description / commit trailers, require a bernstein-session-id: <id> line written by bernstein pr that resolves to a known local session (src/bernstein/core/autofix/ownership.py:1-15).
Label gate - require the bernstein-autofix label on the PR. Removing the label aborts in-flight attempts within one tick.
Cap check - fail-fast needs-human once the PR has burned MAX_ATTEMPTS_PER_PUSH = 3 attempts on the active push SHA (src/bernstein/core/autofix/config.py:60-61).
Cost check - cost_cap_usd per repo. Default $5. An attempt that would breach the cap is aborted and a comment is posted.
Classifier - keyword sweep over the failing log:
- security (CodeQL, CVE, leaked-secret, dependabot) → opus.
- flaky (timeouts, deadlock, rate-limit, 5xx) → sonnet.
- config (lint, mypy, ruff, missing env, syntax) → haiku.

Highest-priority match wins, so a security signal always beats flaky. Source: src/bernstein/core/autofix/classifier.py:1-18. 7. Audit open - append autofix.attempt.start to the HMAC chain. 8. Goal synthesis - deterministic short prompt assembled from PR metadata + truncated log (log_byte_budget, default 64 KiB). 9. Spawn - invoke the dispatch hook (production: bernstein run with the synthesised goal and bandit-selected model). Tests inject a stub. 10. Audit close - append autofix.attempt.end with outcome, commit_sha, cost_usd. Both events share an attempt_id so bernstein audit joins them.

The dispatcher never pushes to git or comments on PRs directly - those side-effects flow through the ActionAdapter protocol so the daemon is fully testable without the network (src/bernstein/core/autofix/dispatcher.py:74-85).

`bernstein autofix` group¶

Source: src/bernstein/cli/commands/autofix_cmd.py.

`autofix start`¶

$ bernstein autofix start [--repo OWNER/REPO ...]
                          [--config /path/to/autofix.toml]
                          [--foreground]
                          [--once]

By default the command double-forks the daemon and returns the PID of the long-running grandchild. Use the systemd / launchd integration (bernstein daemon install) so the OS owns restart logic (src/bernstein/cli/commands/autofix_cmd.py:1-13).

Flags:

--repo OWNER/REPO - restrict the tick to specific repos. Repeatable. Unknown repos (not in autofix.toml) emit a warning but don't abort.
--config <path> - override the default autofix.toml location.
--foreground - stay attached. Use under systemd; never daemonise twice.
--once - single tick then exit. Useful for cron-driven setups that prefer external scheduling over a long-lived daemon.

`autofix stop`¶

$ bernstein autofix stop [--timeout 10]

Sends SIGTERM to the PID stored in .sdd/runtime/autofix.pid, waits up to --timeout seconds for clean exit, then clears the pid file (src/bernstein/core/autofix/daemon.py:388-417). Raises DaemonNotRunningError if no live daemon is found.

`autofix status`¶

$ bernstein autofix status [--limit 20] [--json] [--watch]

Prints daemon up/down state, last-tick timestamp, and the most recent N attempts. --json emits the full snapshot. --watch tails new entries as they land in the JSONL status log.

autofix daemon: running (pid=4711)
last tick:      Wed May  4 14:23:11 2026

Recent attempts (newest first):
  sipyourdrink-ltd/bernstein#1042  attempt=2  outcome=success     classifier=flaky    cost=$0.0314
  sipyourdrink-ltd/bernstein#1041  attempt=1  outcome=needs_human classifier=security cost=$0.0000

`autofix attach`¶

$ bernstein autofix attach [--limit 200]

Replays the last N attempts as JSON-per-line, then tails new entries - same surface attach provides for chat-control sessions (src/bernstein/cli/commands/autofix_cmd.py:398-436). This is the "resume from any terminal" handoff used by the chat-control surfaces.

Trigger conditions¶

Automatic (daemon picks up on tick):

PR is open on a watched repo.
PR has the bernstein-autofix label (src/bernstein/core/autofix/config.py:57).
PR description / commits contain bernstein-session-id: <id> matching a local session (src/bernstein/core/autofix/ownership.py:35-40).
One or more required check runs failed on the latest push.
Active push SHA has burned fewer than MAX_ATTEMPTS_PER_PUSH (3) autofix attempts.
Repo's cost_cap_usd budget would not be breached by the attempt.

Manual (no autofix):

Removing the bernstein-autofix label aborts in-flight attempts on the next tick.
After 3 attempts, the daemon adds a needs-human label and stops retrying that push SHA.
security-classified failures still trigger autofix (with opus) unless the failure pattern indicates a CVE in a transitive dep where the fix requires human judgement; the keyword classifier is a heuristic - escalating manually by removing the label is always available.

Outcomes recorded on the outcome Prometheus label (src/bernstein/core/autofix/dispatcher.py:60-66):

Outcome	Meaning
`success`	Spawn produced a commit; CI is expected to flip green.
`failed`	Spawn ran but the commit didn't fix CI.
`cost_capped`	Aborted before dispatch; would have breached `cost_cap_usd`.
`needs_human`	Attempt cap reached; `needs-human` label added.
`skipped`	Filtered by ownership/label/dedup; no work performed.

Orchestrator interaction¶

Each attempt spawns a fresh deterministic Bernstein run via the configured DispatchHook (src/bernstein/core/autofix/dispatcher.py:87-100). It does not reuse or attach to the original session that opened the PR. Concretely:

A new top-level bernstein run --goal "<synthesised>" --model <bandit> is invoked in a clean worktree on the PR head branch.
The run inherits the autofix-injected cost cap (cost_cap_usd) and the operator's existing cost_cap from bernstein.yaml is applied multiplicatively (whichever bites first).
The run inherits the lifecycle/notification/quality-gate stack exactly like any other run; failures bubble up through the same post_task hooks.
Successful attempts produce a commit on the PR's head branch. The daemon honours allow_force_push per repo (src/bernstein/core/autofix/config.py:88-90); when false it falls back to a merge commit on the branch tip.
The synthesised goal is the only context the spawned run gets - the truncated log and PR metadata. This is intentional, so an attempt is reproducible by hand from the audit record.

Configuration¶

autofix.toml lives at $XDG_CONFIG_HOME/bernstein/autofix.toml (default ~/.config/bernstein/autofix.toml). Source: src/bernstein/core/autofix/config.py.

poll_interval_seconds = 60
log_byte_budget       = 65536

[[repo]]
name             = "sipyourdrink-ltd/bernstein"
cost_cap_usd     = 5.0
allow_force_push = false
label            = "bernstein-autofix"

[[repo]]
name         = "acme-org/example"
cost_cap_usd = 2.0

Top-level keys:

Key	Default	Meaning
`poll_interval_seconds`	60	Tick cadence. Lower = faster reaction, more API quota burn.
`log_byte_budget`	65536	Max bytes of failing log fed to classifier + goal synth (head-truncated).
`[[repo]]`	(req)	At least one repo entry needed; daemon refuses to start otherwise.

Per-repo keys:

Key	Default	Meaning
`name`	(req)	`OWNER/REPO`. Empty/missing = ValueError.
`cost_cap_usd`	5.0	USD ceiling per attempt. 0 = unlimited (don't).
`label`	`bernstein-autofix`	Label that gates whether the daemon may touch a PR.
`allow_force_push`	false	If false, attempts merge-commit on branch tip instead of force-pushing.

Module-level constants worth knowing:

MAX_ATTEMPTS_PER_PUSH = 3 (src/bernstein/core/autofix/config.py:60-61). Hardcoded - not in TOML - because it's a safety rail, not a tuning knob.
SESSION_TRAILER_KEY = bernstein-session-id (src/bernstein/core/autofix/ownership.py:38-40).

Observability¶

Three places to look for autofix activity:

1. Status JSONL - `bernstein autofix attach` and `--watch`¶

.sdd/runtime/autofix.jsonl - one line per dispatched attempt (src/bernstein/core/autofix/daemon.py:50, 159-189):

{
  "ts": 1714843200.123,
  "attempt_id": "abc12345",
  "repo": "sipyourdrink-ltd/bernstein",
  "pr_number": 1042,
  "push_sha": "deadbeef...",
  "run_id": "8765432",
  "session_id": "ses-...",
  "attempt_index": 2,
  "outcome": "success",
  "classifier": "flaky",
  "model": "sonnet",
  "cost_usd": 0.0314,
  "commit_sha": "feedface...",
  "reason": ""
}

2. Prometheus - `/metrics` endpoint¶

Two counters (registered in src/bernstein/core/autofix/metrics.py):

autofix_attempts_total{repo, outcome, classifier} - increments per dispatched attempt. Labels match the Status JSONL fields.
autofix_cost_usd_total{repo} - increments by per-attempt USD spend. cost_capped attempts also increment by their pre-cap spend.

3. Audit log - `bernstein audit`¶

Each attempt writes two HMAC-chained records (src/bernstein/core/autofix/dispatcher.py:24-28):

autofix.attempt.start - repo, PR, run_id, classifier, planned model, goal hash.
autofix.attempt.end - outcome, commit_sha, actual cost.

Joined by attempt_id. The chain is what dr backup preserves, so a restored workspace can be queried for past autofix decisions even after the JSONL log was rotated.

Safety rails (read once, twice)¶

No greenfield work. Autofix only touches PRs Bernstein opened. The session-id trailer + label gate is a hard double-check.
Three-strike rule. A push SHA gets at most 3 attempts. Beyond that, the daemon adds needs-human and stays out of the way.
Spend cap. Per-repo cost_cap_usd is checked before dispatch - a runaway repo cannot drain the account.
Force-push off by default. Most teams want history they can bisect; force-push only if you have configured it explicitly.
No LLM in the scheduling loop. The dispatcher is plain Python and the classifier is regex - escalation decisions are reproducible (src/bernstein/core/autofix/dispatcher.py:1-9).

Code pointers¶

src/bernstein/cli/commands/autofix_cmd.py - CLI surface
src/bernstein/core/autofix/__init__.py:1-79 - package overview
src/bernstein/core/autofix/config.py:1-150 - TOML schema + defaults
src/bernstein/core/autofix/classifier.py:1-90 - keyword classifier (security/flaky/config)
src/bernstein/core/autofix/ownership.py:1-40 - session-id trailer + label gate
src/bernstein/core/autofix/gh_logs.py - gh run view --log-failed wrapper
src/bernstein/core/autofix/dispatcher.py:1-100 - per-attempt pipeline
src/bernstein/core/autofix/daemon.py:1-484 - process supervisor (start/stop/status/attach + tick_once)
src/bernstein/core/autofix/metrics.py - Prometheus counters

Autofix¶

What autofix does, end-to-end¶

bernstein autofix group¶

autofix start¶

autofix stop¶

autofix status¶

autofix attach¶