State Persistence¶
Where does my task graph go between orchestrator runs?
Everything Bernstein needs to resume after a crash lives in the project's .sdd/ directory. Task specs are YAML files, every state transition is appended to a hash-chained write-ahead log before being applied, and large artifacts are stored content-addressed. On restart the orchestrator reads the WAL, replays uncommitted entries through an idempotency filter, and resumes the tick loop where it left off.
If you only have time for one sentence: kill bernstein run, restart it in the same workdir, and your task graph comes back. The rest of this document explains why.
The .sdd/ directory layout¶
Bernstein writes everything under <workdir>/.sdd/. The split between durable (commit-worthy and crash-recoverable) and ephemeral (regenerated each run) is intentional:
| Path | Purpose | Durability |
|---|---|---|
.sdd/backlog/open/*.yaml | Tasks waiting to be picked up by the orchestrator | durable — author by hand or by tool |
.sdd/backlog/claimed/*.yaml | Tasks the orchestrator has ingested into the in-memory store | durable — moved here by ingest_backlog() |
.sdd/backlog/closed/*.yaml | Tasks the janitor has finalised | durable — append-only history |
.sdd/backlog/issues/*.yaml | Tasks synced from GitHub issues | durable — re-synced each run |
.sdd/runtime/wal/<run-id>.wal.jsonl | Hash-chained write-ahead log of every orchestrator decision | durable for crash recovery |
.sdd/runtime/wal/uncommitted.idx.json | Sidecar index of in-flight entries (audit-085) | rebuildable from WAL |
.sdd/runtime/wal/idempotency.jsonl | Replay deduplication markers | durable — survives crashes |
.sdd/metrics/*.jsonl | Cost ledger, cascade chain reports, file-health scores | durable — append-only |
.sdd/cas/{xx}/{sha256} | Content-addressed artifact blobs | durable — deduplicated |
.sdd/audit/merkle/seal-*.json | Tamper-evident audit log seals | durable — compliance evidence |
.sdd/runtime/ (other) | PIDs, sockets, log fragments, agent signals | ephemeral — never commit |
.sdd/worktrees/ | Per-agent git worktrees | ephemeral — recreated on demand |
.sdd/config/mcp_servers.yaml | MCP server auto-discovery catalog | durable — config |
Rule of thumb: anything under .sdd/runtime/ other than wal/ is disposable. Anything under .sdd/backlog/, .sdd/wal/, .sdd/metrics/, .sdd/cas/ or .sdd/audit/ is needed to reconstruct state.
The repo's .gitignore excludes the volatile pieces; the durable pieces can be checked in or shipped to remote storage.
WAL invariants¶
The write-ahead log is a hash-chained append-only JSONL file. There is one file per orchestrator run:
Every line is a JSON object representing one orchestrator decision. WALEntry fields:
@dataclass(frozen=True)
class WALEntry:
seq: int # 0-based monotonic sequence
prev_hash: str # SHA-256 of previous entry; first is GENESIS_HASH
entry_hash: str # SHA-256 over (seq, prev_hash, timestamp, ...)
timestamp: float # Unix seconds
decision_type: str # e.g. "task_spawn_intent", "task_spawn_confirmed"
inputs: dict[str, Any] # what we decided on
output: dict[str, Any] # the resulting action's primary key(s)
actor: str # which orchestrator component wrote it
committed: bool = True # False = pre-execution intent
Source: src/bernstein/core/persistence/wal.py:40-57.
Three invariants make the WAL load-bearing for recovery:
- Append-only.
WALWriter.append()only ever calls `f.write(...) - f.flush() + os.fsync(...)
(wal.py:389-392). No mutation, no truncation. A torn write that leaves a partial trailing line is tolerated by the reader (wal.py:472-475) and by the tail-reader (wal.py:286-330`). - fsync per entry. Every successful
append()returns only after the line is on stable storage (wal.py:392). A process crash immediately afterappend()returns cannot lose the entry. - Hash chain integrity.
prev_hashof entry N+1 equalsentry_hashof entry N.WALReader.verify_chain()walks the file and reports any break (wal.py:488-499). A tamper, a torn write, or an out-of-order append is detectable.
The pattern for a state transition that has external side effects (e.g. spawning an agent process) is:
WALWriter.append(decision_type="task_spawn_intent", committed=False)
↓ fsync returns
spawner.spawn(task) ← real side effect
↓
WALWriter.append(decision_type="task_spawn_confirmed", committed=True)
↓
WALWriter.mark_committed(intent_seq) # remove from sidecar index
If the process crashes between the intent and the confirm, the intent remains in the file with committed=False and the sidecar index points to it. Recovery picks up there.
The sidecar index (UncommittedIndex) is a performance cache only — if it is missing, truncated, or stale, recovery falls back to a full WAL scan and rebuilds it (wal.py:75-94). Loss of the index never costs correctness, only one slow boot.
Recovery on restart¶
When bernstein run starts in a workdir that already contains .sdd/, the recovery sequence is:
1. Load durable backlog
read .sdd/backlog/{open,claimed,closed}/*.yaml
reconstruct in-memory task store
2. WALReplayEngine.scan_and_replay()
a. WALRecovery.scan_all_uncommitted(sdd_dir, exclude_run_id=current)
→ list[(run_id, WALEntry)] where committed=False
b. For each entry:
if entry.decision_type in _SKIP_DECISION_TYPES → mark informational
if (now - entry.timestamp) > MAX_REPLAY_AGE_S (1 h) → mark stale
if IdempotencyStore.was_executed(key) → skip
else → replay_handler(entry)
c. Append "wal_replay_completed" entry to current run's WAL
3. recover_stale_claimed_tasks()
any task left in CLAIMED state by the dead orchestrator is reset to
OPEN so a new agent can pick it up
(core/tasks/task_store_core.py:462)
4. Begin tick loop
Source files: src/bernstein/core/persistence/wal_replay.py:209-315, src/bernstein/core/tasks/task_store_core.py:455-463.
The idempotency store is a JSONL log at .sdd/runtime/wal/idempotency.jsonl mapping (decision_type, entry_hash) → executed. It survives crashes and is consulted before every replay so the same intent is never executed twice (wal_replay.py:79-120).
The 1-hour age cap (_MAX_REPLAY_AGE_S in wal_replay.py:206) means WAL entries older than an hour are marked stale and skipped; the operator must re-trigger the action manually if it is still wanted. This prevents accidental replays of work the operator already abandoned.
CAS store and Merkle integrity¶
Content-addressed artifacts¶
CASStore (src/bernstein/core/persistence/cas_store.py) deduplicates arbitrary bytes by SHA-256 digest. Two agents that emit the same patch or the same screenshot store one blob, not two. Layout mirrors git's object store:
.sdd/cas/{first-2-hex-chars}/{full-sha256-hex}
.sdd/cas/{first-2-hex-chars}/{full-sha256-hex}.meta.json
API (cas_store.py:87-):
store = CASStore(Path(".sdd/cas"))
digest = store.put(b"hello world", content_type="text/plain")
assert store.get(digest) == b"hello world"
Each blob has a .meta.json sidecar containing a CASEntry (cas_store.py:42-58) with size, content type, creation timestamp, and arbitrary user metadata. CASStats tracks how many put() calls hit an existing blob (dedup_saves).
Merkle seals over audit logs¶
merkle.py builds a binary Merkle tree over daily HMAC-chained audit log files. Each file's last-line HMAC becomes a leaf; the root hash is written to .sdd/audit/merkle/seal-<ISO-timestamp>.json and proves no file was deleted, inserted, reordered, or tampered with between seals (src/bernstein/core/persistence/merkle.py:1-58).
This is independent of the WAL hash chain — the WAL protects orchestrator decisions, the Merkle seal protects compliance audit evidence (auth events, policy denials, identity revocations).
Persistence boundary¶
Durable (survive a crash, can be backed up):
- All YAML in
.sdd/backlog/{open,claimed,closed,issues}/ .sdd/runtime/wal/*.wal.jsonlandidempotency.jsonl.sdd/metrics/*.jsonl(cost ledger, cascade chain reports, file health, custom metrics).sdd/cas/**(artifact blobs and.meta.jsonsidecars).sdd/audit/**(HMAC-chained audit logs and Merkle seals).sdd/config/**(MCP server catalog, model policy, etc.)
Ephemeral (regenerated by the next run):
.sdd/runtime/PID files, sockets, agent signal queues.sdd/runtime/logs/per-session log fragments.sdd/worktrees/per-agent git worktrees.sdd/runtime/wal/uncommitted.idx.json(sidecar — rebuildable)
Never commit .sdd/runtime/ or .sdd/worktrees/. The shipped .gitignore excludes both.
Code pointers¶
| Concern | File | Symbol |
|---|---|---|
| WAL writer (append, fsync, hash chain) | src/bernstein/core/persistence/wal.py | WALWriter:226-438 |
| WAL reader + chain verification | src/bernstein/core/persistence/wal.py | WALReader:446-549 |
| WAL recovery scan (all runs) | src/bernstein/core/persistence/wal.py | WALRecovery:660-755 |
| Sidecar uncommitted index | src/bernstein/core/persistence/wal.py | UncommittedIndex:75-218 |
| Replay engine | src/bernstein/core/persistence/wal_replay.py | WALReplayEngine:209-315 |
| Idempotency store | src/bernstein/core/persistence/wal_replay.py | IdempotencyStore:79-150 |
| Stale-claim recovery | src/bernstein/core/tasks/task_store_core.py | recover_stale_claimed_tasks:455-463 |
| CAS store | src/bernstein/core/persistence/cas_store.py | CASStore:87- |
| Merkle seal builder | src/bernstein/core/persistence/merkle.py | file_leaf_hash:66, tree builder |
| Backlog ingest (open → claimed) | src/bernstein/core/orchestration/orchestrator_backlog.py | ingest_backlog:123, _claim_backlog_file:190 |
| Backlog sync (yaml ↔ task server) | src/bernstein/core/persistence/sync.py | BacklogTask:31- |
| Disaster-recovery backup paths | src/bernstein/core/persistence/disaster_recovery.py | _BACKUP_DIRS:46-49 |
Run-time helpers:
# verify WAL integrity
bernstein doctor
# back up the durable parts (used by `bernstein dr`)
bernstein dr export .sdd
# inspect the replay log
jq -r 'select(.decision_type=="wal_replay_completed") | .inputs' \
.sdd/runtime/wal/<run-id>.wal.jsonl
Related¶
architecture/LIFECYCLE.md— task and agent state machines that drive the WAL entries.architecture/storage.md— how.sdd/writes can be redirected to S3/GCS/R2 instead of the local filesystem.operations/disaster-recovery.md— thebernstein drCLI.