Notifications¶
Audience: SREs wiring Bernstein into existing chat/alert systems.
What: Outbound, fire-and-forget notification drivers — Slack, Telegram, Discord, Email, generic webhook, PagerDuty (via webhook), Jira (via webhook), and shell command. Lifecycle events (pre_task, post_task, pre_merge, post_merge, pre_spawn) are fanned out to every configured sink whose filters match.
Why: Bernstein's chat-control bridges (Telegram/Slack/Discord drivers under core/chat/) are inbound — a human attaches and drives a run interactively. Long-running unattended runs need a symmetric outbound path to push "task failed", "merge landed", "budget exceeded" back to humans without hijacking the same chat. The notifications subsystem is that path. See core/notifications/protocol.py:1-22.
Architecture (5-second mental model)¶
LifecycleEvent -> NotifyLifecycleBridge -> NotificationDispatcher -> Sinks
(subscribes hooks) (dedup + retry + (Slack,
dead-letter + audit) Telegram,
Email, ...)
Code anchors:
- Schema:
core/notifications/config.py:118-156 - Bridge wiring:
core/lifecycle/notify_bridge.py:75-202 - Retry/dedup/dead-letter:
core/notifications/bridge.py:46-67, 232-396 - Driver registry:
core/notifications/registry.py:42-282 - CLI:
cli/commands/notify_cmd.py:63-160
State directory layout:
.sdd/runtime/notifications/
dedup.jsonl # event_id LRU window (default 6h, 2048 entries)
dead_letter.jsonl # rotated at 5 MB; rename suffix .<unix-ts>
Drivers¶
Six first-party kinds ship in the wheel; third-party drivers register via the bernstein.notification_sinks entry-point group (core/notifications/registry.py:37-49).
Common config shape (every sink)¶
notifications:
enabled: true
retry:
max_attempts: 4
initial_delay_ms: 250
backoff_factor: 2.0
max_delay_ms: 30000
sinks:
- id: slack-ops # unique per sink; required
kind: slack # driver kind; required
enabled: true # default true
events: [post_task, post_merge] # null = all
severities: [warning, error] # null = all
labels: {team: platform} # free-form tags
# ... driver-specific keys ...
${ENV_VAR} substitution is supported on every string-typed driver field (core/notifications/sinks/slack.py:73-79, etc.).
slack¶
Incoming Webhook flavour — bot tokens / Block Kit are out of scope. Source: core/notifications/sinks/slack.py.
- id: slack-ops
kind: slack
webhook_url: ${SLACK_WEBHOOK_URL} # required
username: bernstein # optional
icon_emoji: ":robot_face:" # optional
channel: "#ops-alerts" # optional, overrides webhook default
timeout_s: 10.0
telegram¶
Reuses the in-process TelegramBridge so chat-mode and notify-mode share one rate limiter. Source: core/notifications/sinks/telegram.py.
- id: tg-oncall
kind: telegram
chat_id: "-1001234567890" # required (negative for groups)
token: ${BERNSTEIN_TG_TOKEN} # required (or pass live bridge in code)
discord¶
Incoming Webhook flavour, embed-formatted with severity colour mapping (info → blue, warning → yellow, error → red). Source: core/notifications/sinks/discord.py:16-21.
- id: discord-ci
kind: discord
webhook_url: ${DISCORD_WEBHOOK_URL} # required
username: bernstein # optional
avatar_url: https://... # optional
email_smtp¶
stdlib smtplib on a background thread; STARTTLS by default. Source: core/notifications/sinks/email_smtp.py.
- id: email-onfailure
kind: email_smtp
host: smtp.example.com # required
from_addr: bernstein@example.com # required
to_addrs: # required, non-empty
- alice@example.com
- ops@example.com
port: 587 # default 587
username: ${SMTP_USER} # optional
password: ${SMTP_PASS} # optional
starttls: true
ssl: false
timeout_s: 15.0
webhook¶
Generic JSON POST — body is NotificationEvent.to_payload(). Drop-in for PagerDuty Events v2, Jira REST, Opsgenie, custom routers. Source: core/notifications/sinks/webhook.py.
# PagerDuty:
- id: pd-critical
kind: webhook
url: https://events.pagerduty.com/v2/enqueue
headers:
Content-Type: application/json
events: [post_task]
severities: [error]
# Jira:
- id: jira-bug
kind: webhook
url: ${JIRA_BASE_URL}/rest/api/3/issue
headers:
Authorization: Basic ${JIRA_BASIC_AUTH_B64}
Content-Type: application/json
For PagerDuty's exact Events API v2 envelope, the legacy formatter core/communication/notifications.py:287-308 (format_pagerduty) renders the routing-key/dedup-key payload — wrap your own webhook handler around it if you need v2 schema compliance instead of the raw event dump.
shell¶
Spawns argv with the JSON event payload on stdin and a pinned env. No shell interpolation — safe by construction. Source: core/notifications/sinks/shell.py.
- id: pager-script
kind: shell
command: ["/usr/local/bin/page", "--severity"]
env:
EXTRA_KEY: literal-value
timeout_s: 30
non_zero_exit_is_permanent: false # true = no retry on rc != 0
The child process gets PATH, HOME, USER, every BERNSTEIN_* var, plus BERNSTEIN_NOTIFY_{EVENT_ID,KIND,SEVERITY,SINK_ID} and any env: keys declared above (core/notifications/sinks/shell.py:34, 110-124).
Event types¶
Defined in core/notifications/protocol.py:40-55 and aligned 1:1 with bernstein.core.lifecycle.hooks.LifecycleEvent:
| Kind | Trigger |
|---|---|
pre_task | A task is about to be claimed and dispatched. |
post_task | A task finished (success or failure; check severity). |
pre_merge | Drain-merge cycle is about to start. |
post_merge | Drain-merge cycle finished. |
pre_spawn | Agent process is about to be spawned. |
post_spawn | Agent spawned (not subscribed by default). |
synthetic | Test events from bernstein notify test. |
Severity is derived from BERNSTEIN_TASK_OUTCOME in the lifecycle context (core/lifecycle/notify_bridge.py:332-344):
failed/error->errorwarning/warn->warningrolled_backonpost_task/post_merge->warning- otherwise ->
info
Each event carries a stable event_id derived from sha256(event|task|session|timestamp)[:32] so a restart loop dedups correctly (core/lifecycle/notify_bridge.py:347-365). Pre-compute and inject via ctx.env['BERNSTEIN_NOTIFY_EVENT_ID'] if you need explicit control.
bernstein notify group¶
Source: cli/commands/notify_cmd.py.
notify list¶
Print every configured sink as JSON-per-line. No live state — purely reflects the parsed YAML.
$ bernstein notify list --config bernstein.yaml
{"id": "slack-ops", "kind": "slack", "enabled": true, "events": ["post_task"], "severities": null}
{"id": "tg-oncall", "kind": "telegram", "enabled": true, "events": null, "severities": ["error"]}
notify test --sink <id>¶
Fire a synthetic event end-to-end through one driver. Useful before a production run to verify webhooks resolve and tokens authorise.
$ bernstein notify test --sink slack-ops --event synthetic
{"sink_id": "slack-ops", "event_id": "synthetic-slack-ops-1714857600000", "outcome": "dispatched"}
Flags:
--sink <id>— required; matchesbernstein.yaml::notifications.sinks[*].id.--event <kind>— defaults tosynthetic. One ofpre_task,post_task,pre_merge,post_merge,pre_spawn,post_spawn,synthetic.--config <path>— defaults to./bernstein.yaml.
The command builds an isolated NotificationsConfig containing only the named sink, so a smoke test never accidentally spams every sink (cli/commands/notify_cmd.py:107-118).
Severity, filtering, dedup, retry¶
Severity / event filtering¶
Per-sink events and severities arrays are AND-combined. null (YAML omitted) means "all". Validation lives in core/notifications/config.py:95-115:
- Allowed events:
pre_task,post_task,pre_merge,post_merge,pre_spawn,post_spawn,synthetic. - Allowed severities:
info,warning,error.
The bridge filters in core/lifecycle/notify_bridge.py:186-197 before calling the dispatcher.
Deduplication (rate-limit by content)¶
The dispatcher keeps an LRU + on-disk window keyed by event_id (core/notifications/bridge.py:94-174). Defaults:
dedup_lru_size: 2048 entries (in-memory)dedup_window_seconds: 21600 (6 hours)
Both knobs live under notifications: at top level in bernstein.yaml. A restart re-seeds the LRU from .sdd/runtime/notifications/dedup.jsonl so a crash loop is throttled too.
Retry policy¶
Exponential backoff per sink, capped by max_delay_ms (core/notifications/bridge.py:70-91):
notifications:
retry:
max_attempts: 4 # 1 disables retry
initial_delay_ms: 250
backoff_factor: 2.0
max_delay_ms: 30000
Drivers raise NotificationDeliveryError for transient failures (the dispatcher retries) and NotificationPermanentError for non-retryable ones (skip backoff, write straight to dead-letter) (core/notifications/protocol.py:77-92).
Dead-lettering¶
Permanent failures (and exhausted retries) append to .sdd/runtime/notifications/dead_letter.jsonl as one JSON record per line (core/notifications/bridge.py:177-229). Rotated to dead_letter.jsonl.<unix-ts> past 5 MB.
Audit¶
Every terminal outcome (delivered / deduplicated / failed_retrying / failed_permanent) flows through an optional audit_hook (core/notifications/bridge.py:368-396) so the orchestrator can append a tamper-evident record to the HMAC chain shared with security.audit.AuditLog. Wired by default when the bridge is constructed via build_bridge_from_config (core/lifecycle/notify_bridge.py:229-297).
Code pointers¶
src/bernstein/core/notifications/protocol.py:1-22— design rationalesrc/bernstein/core/notifications/protocol.py:40-92— event/outcome enums + error classessrc/bernstein/core/notifications/protocol.py:95-187—NotificationEvent+NotificationSinkprotocolsrc/bernstein/core/notifications/config.py:46-156— pydantic schema, validatorssrc/bernstein/core/notifications/bridge.py:94-229—DedupCache,DeadLettersrc/bernstein/core/notifications/bridge.py:232-396—NotificationDispatchersrc/bernstein/core/notifications/registry.py:42-282— first-party kinds + entry-point loadersrc/bernstein/core/notifications/sinks/slack.py— Slacksrc/bernstein/core/notifications/sinks/telegram.py— Telegramsrc/bernstein/core/notifications/sinks/discord.py— Discordsrc/bernstein/core/notifications/sinks/email_smtp.py— SMTP emailsrc/bernstein/core/notifications/sinks/webhook.py— generic JSON POSTsrc/bernstein/core/notifications/sinks/shell.py— shell commandsrc/bernstein/core/lifecycle/notify_bridge.py— lifecycle hook gluesrc/bernstein/cli/commands/notify_cmd.py—bernstein notify {test,list}src/bernstein/core/communication/notifications.py:97-308— legacy formatters (format_slack,format_discord,format_telegram,format_webhook,format_pagerduty) re-exported viabernstein.core.notificationsfor backwards compatibility