Skip to content

State Persistence

Where does my task graph go between orchestrator runs?

Everything Bernstein needs to resume after a crash lives in the project's .sdd/ directory. Task specs are YAML files, every state transition is appended to a hash-chained write-ahead log before being applied, and large artifacts are stored content-addressed. On restart the orchestrator reads the WAL, replays uncommitted entries through an idempotency filter, and resumes the tick loop where it left off.

If you only have time for one sentence: kill bernstein run, restart it in the same workdir, and your task graph comes back. The rest of this document explains why.


The .sdd/ directory layout

Bernstein writes everything under <workdir>/.sdd/. The split between durable (commit-worthy and crash-recoverable) and ephemeral (regenerated each run) is intentional:

Path Purpose Durability
.sdd/backlog/open/*.yaml Tasks waiting to be picked up by the orchestrator durable — author by hand or by tool
.sdd/backlog/claimed/*.yaml Tasks the orchestrator has ingested into the in-memory store durable — moved here by ingest_backlog()
.sdd/backlog/closed/*.yaml Tasks the janitor has finalised durable — append-only history
.sdd/backlog/issues/*.yaml Tasks synced from GitHub issues durable — re-synced each run
.sdd/runtime/wal/<run-id>.wal.jsonl Hash-chained write-ahead log of every orchestrator decision durable for crash recovery
.sdd/runtime/wal/uncommitted.idx.json Sidecar index of in-flight entries (audit-085) rebuildable from WAL
.sdd/runtime/wal/idempotency.jsonl Replay deduplication markers durable — survives crashes
.sdd/metrics/*.jsonl Cost ledger, cascade chain reports, file-health scores durable — append-only
.sdd/cas/{xx}/{sha256} Content-addressed artifact blobs durable — deduplicated
.sdd/audit/merkle/seal-*.json Tamper-evident audit log seals durable — compliance evidence
.sdd/runtime/ (other) PIDs, sockets, log fragments, agent signals ephemeral — never commit
.sdd/worktrees/ Per-agent git worktrees ephemeral — recreated on demand
.sdd/config/mcp_servers.yaml MCP server auto-discovery catalog durable — config

Rule of thumb: anything under .sdd/runtime/ other than wal/ is disposable. Anything under .sdd/backlog/, .sdd/wal/, .sdd/metrics/, .sdd/cas/ or .sdd/audit/ is needed to reconstruct state.

The repo's .gitignore excludes the volatile pieces; the durable pieces can be checked in or shipped to remote storage.


WAL invariants

The write-ahead log is a hash-chained append-only JSONL file. There is one file per orchestrator run:

.sdd/runtime/wal/<run-id>.wal.jsonl

Every line is a JSON object representing one orchestrator decision. WALEntry fields:

@dataclass(frozen=True)
class WALEntry:
    seq: int                # 0-based monotonic sequence
    prev_hash: str          # SHA-256 of previous entry; first is GENESIS_HASH
    entry_hash: str         # SHA-256 over (seq, prev_hash, timestamp, ...)
    timestamp: float        # Unix seconds
    decision_type: str      # e.g. "task_spawn_intent", "task_spawn_confirmed"
    inputs: dict[str, Any]  # what we decided on
    output: dict[str, Any]  # the resulting action's primary key(s)
    actor: str              # which orchestrator component wrote it
    committed: bool = True  # False = pre-execution intent

Source: src/bernstein/core/persistence/wal.py:40-57.

Three invariants make the WAL load-bearing for recovery:

  1. Append-only. WALWriter.append() only ever calls `f.write(...)
  2. f.flush() + os.fsync(...)(wal.py:389-392). No mutation, no truncation. A torn write that leaves a partial trailing line is tolerated by the reader (wal.py:472-475) and by the tail-reader (wal.py:286-330`).
  3. fsync per entry. Every successful append() returns only after the line is on stable storage (wal.py:392). A process crash immediately after append() returns cannot lose the entry.
  4. Hash chain integrity. prev_hash of entry N+1 equals entry_hash of entry N. WALReader.verify_chain() walks the file and reports any break (wal.py:488-499). A tamper, a torn write, or an out-of-order append is detectable.

The pattern for a state transition that has external side effects (e.g. spawning an agent process) is:

WALWriter.append(decision_type="task_spawn_intent", committed=False)
   ↓ fsync returns
spawner.spawn(task)         ← real side effect
WALWriter.append(decision_type="task_spawn_confirmed", committed=True)
WALWriter.mark_committed(intent_seq)  # remove from sidecar index

If the process crashes between the intent and the confirm, the intent remains in the file with committed=False and the sidecar index points to it. Recovery picks up there.

The sidecar index (UncommittedIndex) is a performance cache only — if it is missing, truncated, or stale, recovery falls back to a full WAL scan and rebuilds it (wal.py:75-94). Loss of the index never costs correctness, only one slow boot.


Recovery on restart

When bernstein run starts in a workdir that already contains .sdd/, the recovery sequence is:

1. Load durable backlog
   read .sdd/backlog/{open,claimed,closed}/*.yaml
   reconstruct in-memory task store

2. WALReplayEngine.scan_and_replay()
   a. WALRecovery.scan_all_uncommitted(sdd_dir, exclude_run_id=current)
      → list[(run_id, WALEntry)] where committed=False
   b. For each entry:
        if entry.decision_type in _SKIP_DECISION_TYPES → mark informational
        if (now - entry.timestamp) > MAX_REPLAY_AGE_S (1 h) → mark stale
        if IdempotencyStore.was_executed(key) → skip
        else                                     → replay_handler(entry)
   c. Append "wal_replay_completed" entry to current run's WAL

3. recover_stale_claimed_tasks()
   any task left in CLAIMED state by the dead orchestrator is reset to
   OPEN so a new agent can pick it up
   (core/tasks/task_store_core.py:462)

4. Begin tick loop

Source files: src/bernstein/core/persistence/wal_replay.py:209-315, src/bernstein/core/tasks/task_store_core.py:455-463.

The idempotency store is a JSONL log at .sdd/runtime/wal/idempotency.jsonl mapping (decision_type, entry_hash) → executed. It survives crashes and is consulted before every replay so the same intent is never executed twice (wal_replay.py:79-120).

The 1-hour age cap (_MAX_REPLAY_AGE_S in wal_replay.py:206) means WAL entries older than an hour are marked stale and skipped; the operator must re-trigger the action manually if it is still wanted. This prevents accidental replays of work the operator already abandoned.


CAS store and Merkle integrity

Content-addressed artifacts

CASStore (src/bernstein/core/persistence/cas_store.py) deduplicates arbitrary bytes by SHA-256 digest. Two agents that emit the same patch or the same screenshot store one blob, not two. Layout mirrors git's object store:

.sdd/cas/{first-2-hex-chars}/{full-sha256-hex}
.sdd/cas/{first-2-hex-chars}/{full-sha256-hex}.meta.json

API (cas_store.py:87-):

store = CASStore(Path(".sdd/cas"))
digest = store.put(b"hello world", content_type="text/plain")
assert store.get(digest) == b"hello world"

Each blob has a .meta.json sidecar containing a CASEntry (cas_store.py:42-58) with size, content type, creation timestamp, and arbitrary user metadata. CASStats tracks how many put() calls hit an existing blob (dedup_saves).

Merkle seals over audit logs

merkle.py builds a binary Merkle tree over daily HMAC-chained audit log files. Each file's last-line HMAC becomes a leaf; the root hash is written to .sdd/audit/merkle/seal-<ISO-timestamp>.json and proves no file was deleted, inserted, reordered, or tampered with between seals (src/bernstein/core/persistence/merkle.py:1-58).

This is independent of the WAL hash chain — the WAL protects orchestrator decisions, the Merkle seal protects compliance audit evidence (auth events, policy denials, identity revocations).


Persistence boundary

Durable (survive a crash, can be backed up):

  • All YAML in .sdd/backlog/{open,claimed,closed,issues}/
  • .sdd/runtime/wal/*.wal.jsonl and idempotency.jsonl
  • .sdd/metrics/*.jsonl (cost ledger, cascade chain reports, file health, custom metrics)
  • .sdd/cas/** (artifact blobs and .meta.json sidecars)
  • .sdd/audit/** (HMAC-chained audit logs and Merkle seals)
  • .sdd/config/** (MCP server catalog, model policy, etc.)

Ephemeral (regenerated by the next run):

  • .sdd/runtime/ PID files, sockets, agent signal queues
  • .sdd/runtime/logs/ per-session log fragments
  • .sdd/worktrees/ per-agent git worktrees
  • .sdd/runtime/wal/uncommitted.idx.json (sidecar — rebuildable)

Never commit .sdd/runtime/ or .sdd/worktrees/. The shipped .gitignore excludes both.


Code pointers

Concern File Symbol
WAL writer (append, fsync, hash chain) src/bernstein/core/persistence/wal.py WALWriter:226-438
WAL reader + chain verification src/bernstein/core/persistence/wal.py WALReader:446-549
WAL recovery scan (all runs) src/bernstein/core/persistence/wal.py WALRecovery:660-755
Sidecar uncommitted index src/bernstein/core/persistence/wal.py UncommittedIndex:75-218
Replay engine src/bernstein/core/persistence/wal_replay.py WALReplayEngine:209-315
Idempotency store src/bernstein/core/persistence/wal_replay.py IdempotencyStore:79-150
Stale-claim recovery src/bernstein/core/tasks/task_store_core.py recover_stale_claimed_tasks:455-463
CAS store src/bernstein/core/persistence/cas_store.py CASStore:87-
Merkle seal builder src/bernstein/core/persistence/merkle.py file_leaf_hash:66, tree builder
Backlog ingest (open → claimed) src/bernstein/core/orchestration/orchestrator_backlog.py ingest_backlog:123, _claim_backlog_file:190
Backlog sync (yaml ↔ task server) src/bernstein/core/persistence/sync.py BacklogTask:31-
Disaster-recovery backup paths src/bernstein/core/persistence/disaster_recovery.py _BACKUP_DIRS:46-49

Run-time helpers:

# verify WAL integrity
bernstein doctor

# back up the durable parts (used by `bernstein dr`)
bernstein dr export .sdd

# inspect the replay log
jq -r 'select(.decision_type=="wal_replay_completed") | .inputs' \
  .sdd/runtime/wal/<run-id>.wal.jsonl
  • architecture/LIFECYCLE.md — task and agent state machines that drive the WAL entries.
  • architecture/storage.md — how .sdd/ writes can be redirected to S3/GCS/R2 instead of the local filesystem.
  • operations/disaster-recovery.md — the bernstein dr CLI.