Voice control¶
bernstein listen is an experimental voice front-end: capture audio from your microphone, transcribe it locally with faster-whisper, match the result against a small grammar plus a user-defined alias file, and either print or run the resulting Bernstein CLI command.
reference/FEATURE_MATRIX.md:185 flags this as experimental. The parser only knows a handful of phrases, the audio capture loop is a single thread, and there is no wake-word — once started, every utterance above the silence threshold is transcribed and matched. Plan accordingly.
The CLI lives at cli/commands/voice_cmd.py:437 (@click.command("listen")).
What bernstein listen does¶
The command is a thin loop (cli/commands/voice_cmd.py:378-429):
- Wait for audio above the RMS silence threshold.
- Record until the speaker pauses (
max_silence_chunksof silence). - Pass the audio to a
faster_whisper.WhisperModeland read back text (voice_cmd.py:244-280). - Look up the text in
~/.bernstein/voice.yaml(exact, then prefix match) (voice_cmd.py:175-201). - If no alias matched, walk the built-in regex grammar (
voice_cmd.py:70-124). - With
--dry-run, print the parsed CLI command. Otherwise, exec it viasubprocess.run(shlex.split(cmd), check=False)(voice_cmd.py:419-427).
The base command used for built-in patterns is {sys.executable} -m bernstein (voice_cmd.py:166-172). Alias strings are passed through verbatim.
Setup¶
Requirements (voice_cmd.py:283-296):
sounddevice— microphone capture (PortAudio under the hood).numpy— audio buffers.faster-whisper— local STT. The first run downloads the chosen model (tiny≈ 39 MB,base≈ 150 MB,small≈ 460 MB; larger models scale up). The model is cached under~/.cache/huggingface/.
Platform notes:
- Linux: install PortAudio dev headers (
libportaudio2on Debian / Ubuntu,portaudioon Arch). - macOS:
brew install portaudiois enough. - Windows:
pip install sounddeviceships prebuilt wheels.
The CLI exits with a clear install hint if any of the three deps are missing (voice_cmd.py:271-296).
Commands recognized¶
The default grammar (voice_cmd.py:70-124) maps phrases to CLI invocations:
| Said | Runs |
|---|---|
run three agents on the auth refactor | bernstein -g "the auth refactor" -j 3 |
run agents on add tests for parser | bernstein -g "add tests for parser" |
run deploy to production | bernstein -g "deploy to production" |
status / show status | bernstein status |
stop / stop agents / stop all | bernstein stop |
list agents / show agents | bernstein agents list |
recap / show recap / show results | bernstein recap |
logs / show logs | bernstein logs |
cost / show cost | bernstein cost |
plan / show plan | bernstein plan |
help | bernstein --help |
Number words one through ten are accepted in the worker count (voice_cmd.py:33-58) — "run three agents" and "run 3 agents" both parse to -j 3.
Custom aliases¶
Drop a YAML file at ~/.bernstein/voice.yaml (override with --alias-file):
The lookup is done in lowercase. Exact matches win first; otherwise the parser walks the alias keys looking for a prefix match against the utterance (voice_cmd.py:189-198). If your alias starts with the same words as a built-in phrase, the alias wins.
The alias file is parsed with yaml.safe_load and entries with non-string values are silently ignored (voice_cmd.py:148-158). A malformed file prints a yellow warning and the daemon falls back to no aliases.
CLI flags¶
bernstein listen [--dry-run]
[--model tiny|base|small|medium|large-v2]
[--alias-file PATH]
[--threshold 0.01]
[--min-duration 0.5]
--dry-run— print the parsed command without executing (voice_cmd.py:438-444).--model— whisper model size. Smaller is faster, larger is more accurate. Defaultbase(voice_cmd.py:445-452).--alias-file— alternate path tovoice.yaml. Default~/.bernstein/voice.yaml(voice_cmd.py:27,voice_cmd.py:453-460).--threshold— RMS amplitude threshold for speech vs silence. Higher = need to speak louder (voice_cmd.py:462-467).--min-duration— minimum utterance length in seconds before transcription is attempted (voice_cmd.py:469-475). Avoids transcribing single noises.
Privacy: local-vs-cloud STT¶
bernstein listen is local-only. faster-whisper ships an offline inference engine; no audio is sent anywhere. The model weights download once on first use and live under ~/.cache/huggingface/. This is deliberate (voice_cmd.py:1-8):
- No transcripts of operator commands hit a cloud STT vendor.
- No corporate codebase names, ticket IDs, or CI tokens (heard in the background of the mic) are exfiltrated.
- The transcription quality is still good enough for the short imperative phrases the parser knows about.
There is no telemetry of utterances or matched commands; the only output is the rendered command on stdout and (without --dry-run) the spawned bernstein subprocess.
Limitations¶
This subsystem is experimental and has known gaps:
- No wake word. Once started, every utterance above the silence threshold is transcribed and dispatched. Run inside a quiet room or use
--dry-runwhile you tune. - Tiny grammar. Only the eleven patterns above plus your alias file. Anything outside the grammar prints
No matching command. Try: run/status/stop/list agents/recap.(voice_cmd.py:412-414) and is ignored. - English only. Whisper supports more languages, but the grammar patterns are English-only and the transcribe call hard-codes
language="en"(voice_cmd.py:254). - Single mic, default device. The
sd.InputStreamuses whatever PortAudio considers the default input. There is no flag to pick a device. - No live partials. Each utterance is recorded fully, then transcribed end-to-end. There is a perceptible pause between speech and command dispatch.
- Subprocess shell is your default
bernstein. Voice does not share state with an existingbernstein runprocess; it just spawns a new CLI invocation per utterance. If you say "stop", that is a freshbernstein stopagainst the current working directory. - No safe-mode for destructive commands.
bernstein listenwill happily firebernstein stopfrom a noisy meeting if the word "stop" is heard. Use--dry-rununtil you are sure of the environment. - Alias matching is naïve. Lowercased exact-or-prefix only. There is no fuzzy matching, no synonyms, no diacritic folding.
- No daemon mode. The command runs in the foreground until
Ctrl-C. Wrap intmux/nohup/ a launchd or systemd unit if you want it to survive a terminal close.
If voice control becomes load-bearing for your workflow, consider contributing additional grammar entries and / or a wake-word stage. The architecture in voice_cmd.py is small enough to extend without a larger refactor.
Code pointers¶
cli/commands/voice_cmd.py:437—@click.command("listen")entry point.cli/commands/voice_cmd.py:27—_DEFAULT_ALIAS_FILE=~/.bernstein/voice.yaml.cli/commands/voice_cmd.py:33-58— number-word table and worker count parsing.cli/commands/voice_cmd.py:70-124— built-in grammar ((regex, command_template)pairs).cli/commands/voice_cmd.py:131-158—_load_aliases(yaml.safe_load).cli/commands/voice_cmd.py:166-172—_base_command()={sys.executable} -m bernstein.cli/commands/voice_cmd.py:175-225—parse_utterance(alias → grammar fallback).cli/commands/voice_cmd.py:244-280— whisper transcription helpers,_load_whisper_model.cli/commands/voice_cmd.py:283-296— sounddevice / numpy import + install hint.cli/commands/voice_cmd.py:299-370— record-until-silence loop and_capture_and_transcribe.cli/commands/voice_cmd.py:378-429— main_listen_loopwith--dry-runand subprocess dispatch.docs/reference/FEATURE_MATRIX.md:185— flagsbernstein listenas Voice commands (experimental).