Skip to content

Replay & Replay-Filter

bernstein replay re-displays the events from a past orchestration run so you can debug, diff, and reproduce. There are two replay surfaces:

  1. bernstein replay (source: cli/commands/advanced_cmd.py:876) — the original replay, optionally with task-trace re-submission.
  2. bernstein replay-filter (registered as replay's filter wrapper) (source: cli/commands/replay_filter_cmd.py:164) — adds --filter, --event-type, --agent, --search.

Both read from the same underlying log on disk; the filter command is a strict superset of the basic command for read-only inspection. The basic command additionally supports task-trace replay, which re-creates a new task from a stored task trace and (optionally) compares the replay's result_summary against the original via a colour diff.

What replay does (and doesn't do)

Replay is deterministic re-display of a past run's recorded events. Every event the orchestrator emitted — run_started, agent_spawned, task_claimed, task_completed, task_verification_failed, agent_reaped, run_completed — is replayed in order with its original timing offsets.

What replay does not do:

  • It does not re-execute external HTTP calls. Any HTTP traffic the original agents performed (LLM API calls, GitHub API writes, webhook deliveries) is captured in the log but not re-issued.
  • State mutations to remote services (a PR opened, a Slack message sent, a row inserted into your database) are not rolled back or repeated.
  • It does not re-create branches or worktrees. The git state is whatever your repo currently is.

For full re-execution of the same task with a (potentially different) model, use the task-trace mode: bernstein replay <task_id> --model opus. This re-submits the original task description (plus any --extra-context you provide) as a new task on the running server and waits for it to finish, then renders a diff between the original and the new result_summary. (cli/commands/advanced_cmd.py:841-873.)

Where replay state lives

.sdd/
  runs/
    <run_id>/
      replay.jsonl           # event log (one JSON event per line)
      session.json           # session metadata (started_at, git_branch, git_sha, config_hash)
  traces/
    <task_id>-<timestamp>.json  # per-task traces (used by task-trace replay)
  • The run-event log path constant: _REPLAY_JSONL = "replay.jsonl" (cli/commands/advanced_cmd.py:725).
  • Session metadata is parsed by read_session_replay_metadata() from core/runtime_state.py.
  • Task traces are loaded by core.traces.TraceStore (cli/commands/replay_filter_cmd.py:108-117).

The fingerprint shown after every replay is a SHA-256 hash of the canonicalized event sequence (core.recorder.compute_replay_fingerprint); identical event streams produce identical fingerprints, which is how you verify two runs really are the same.


bernstein replay

Synopsis: bernstein replay RUN_ID_OR_TASK_ID [flags]

Flags: (source: cli/commands/advanced_cmd.py:876-906)

Flag Default Meaning
RUN_ID_OR_TASK_ID required Run ID, the literal latest, the literal list, or a task ID.
--sdd-dir PATH .sdd Path to the .sdd state directory.
--as-json off Emit raw JSONL (one event per line) instead of the Rich table.
--limit N none Show only the first N events.
--model NAME none Override model for task-trace replay (e.g. opus, sonnet, o3).
--extra-context TEXT none Append extra hint text to the replayed task description.

Resolution rules:

  • bernstein replay list — print every recorded run with timing, branch, SHA, event count, log size.
  • bernstein replay latest — replay the most recent run.
  • bernstein replay <run_id> — replay a specific run by directory name.
  • bernstein replay <task_id> (no run with that ID exists) — falls through to task-trace replay: re-submit the task and diff result summaries.

The Rich table columns are TIME (offset from run_started), EVENT, AGENT, TASK, DETAIL. Common detail keys: model, role, cost_usd, fingerprint, tick, failed_signals. Events are colour-coded by type (run_started / task_completed are green; agent_reaped and task_verification_failed are red).

# What ran most recently?
bernstein replay latest

# Specific run, machine-readable
bernstein replay 20260415-143022 --as-json | jq '.events[] | select(.event=="task_completed")'

# Re-execute task T-abc123 on Opus instead of whatever it ran on originally
bernstein replay T-abc123 --model opus --extra-context "Make sure tests pass on Python 3.11."

bernstein replay-filter

A strict superset of bernstein replay for read-only inspection. Adds four orthogonal filters that compose with each other.

Synopsis: bernstein replay-filter RUN_ID [flags]

Flags: (source: cli/commands/replay_filter_cmd.py:164-186)

Flag Default Meaning
RUN_ID required Same as replay: run ID, latest, list, or task ID.
--sdd-dir PATH .sdd Path to the .sdd state directory.
--as-json off Emit raw JSONL.
--limit N none Show only the first N filtered events.
--filter "k=v,..." none Comma-separated key=value filters; values are regex (case-insensitive).
--event-type TYPE none Show only events of a specific type.
--agent ID none Show only events from this agent (substring match).
--search TEXT none Full-text substring search across all event fields.
--model NAME none Override model for task-trace replay.
--extra-context TEXT none Append text to a replayed task description.

The four filters compose with AND semantics: an event must match every active filter.

Filter expression syntax (--filter): comma-separated key=value pairs. Values are regular expressions, applied case-insensitively against the event field's stringified value. So --filter "role=backend,status=done" keeps events whose role field matches backend AND whose status field matches done.

# Just the agent_spawned events from the latest run
bernstein replay latest --event-type agent_spawned

# Anything mentioning "fail" in the most recent run, as JSON
bernstein replay latest --search fail --as-json | jq '.events[]'

# Backend tasks that completed
bernstein replay latest --filter "role=backend,status=done" --limit 10

# All events from a specific agent session, with full JSON output
bernstein replay 20260415-143022 --agent backend-abc --as-json

When --as-json is set, the filtered output schema is:

{
  "run_id": "<id>",
  "events": [ /* filtered events, capped to --limit */ ],
  "total_matched": 42
}

Common use cases

Reproduce a flaky failure. Run bernstein replay-filter latest --event-type task_verification_failed to see exactly which gate failed and on which agent. The detail column carries the failed signal names; cross-reference with .sdd/traces/ for the agent's full transcript.

Compare models on the same task. Find the run where the task succeeded:

bernstein replay-filter latest --search "T-abc123" --event-type task_completed

then re-run with a different model:

bernstein replay T-abc123 --model sonnet --extra-context "Use Pydantic v2"

The CLI prints a diff of the two result_summary strings.

Verify a fix. After fixing a bug, run the failing task again with bernstein replay <task_id> and compare. If the fingerprint changes only in the expected places, you have evidence the fix held.


Limits

  • Replay does not re-issue HTTP calls. Mocking a remote dependency from the original log is not supported — agents in task-trace replay make fresh calls.
  • Side effects to remote services (PRs, messages, webhooks, DB rows) from the original run are not undone by replay. There is no "rewind" mode.
  • Run-event replay only re-renders what was recorded. If recorder.py (cli/commands/replay_filter_cmd.py:232) didn't capture an event class, it will not appear.
  • Fingerprints depend on the exact recorder version. A replay log written by an older Bernstein may produce a different fingerprint when re-fingerprinted by a newer build.
  • Task-trace replay submits a new task — it does not retroactively re-run the original task in place. The original task's record stays in the archive untouched.

For deeper integrity guarantees, see fingerprint (re-computes the run's SHA-256 and verifies it against a stored reference).