zos-voice — voice-note ingestion.
zos‑voice turns a folder of voice notes into a folder of Markdown transcripts — privately. It sweeps a plain local directory of audio files, transcribes each one with a LOCAL backend you supply (mlx_whisper, whisper.cpp, or any callable), and lands one .md per note, with state markers, partial-success sweeps, retry of failures, and a true dry-run. Python 3.11+ · stdlib only · BUSL-1.1
Status: working code · 49 tests · CI green · BUSL-1.1 · repository private until public launch.
The privacy contract
No function in this package opens a network connection, ever. No bundled model, no hosted API, no telemetry. Subprocesses are only spawned for the command template you configure, with shell=False and a timeout. The test suite enforces this (tests/test_privacy.py): a full sweep runs with socket.socket monkeypatched to raise, and a static guard fails the build if the package ever imports socket/http/urllib/requests.
The input is a plain local folder. If a cloud-sync client (Drive, Dropbox, Syncthing, …) happens to populate that folder, that is entirely the caller's business — zos-voice neither knows nor cares, and never talks to any service on your behalf. What happens to transcripts afterwards (classification, sync, landing into your notes system) is deliberately out of scope — zos-voice stops at the .md on local disk.
Transcribers — zos_voice.transcriber
workdir is a fresh per-note scratch directory provided by the sweep. Failures must raise TranscriptionError(RuntimeError) — one note failed; the sweep records it and continues (partial success).
CommandTranscriber(template, output="stdout", timeout=600, name=None)
Wraps any local CLI.
template: Sequence[str]— argv tokens;{audio}and{workdir}are substituted per note. Executed directly — no shell is ever spawned. Must reference{audio};output="txt"templates must reference{workdir}.output—"stdout": transcript is the command's stdout."txt": the command writes<audio-stem>.txtinto{workdir}(mlx_whisper style).timeout— per-note seconds; expiry raisesTranscriptionError.name— defaults to the binary's basename; lands in transcript frontmatter.
CallableTranscriber(fn, name="callable")
Wraps fn(audio_path: Path) -> str (an in-process model, or a test fake). Non-string/empty results and any exception become TranscriptionError.
| Example backend | Command template | output |
|---|---|---|
| mlx_whisper (Apple Silicon) | mlx_whisper {audio} --model mlx-community/whisper-large-v3-turbo --output-format txt --output-dir {workdir} | txt |
| whisper.cpp | whisper-cli -m /path/to/ggml-base.en.bin -f {audio} -np -nt | stdout |
Sweep — zos_voice.sweep
source_dir— plain local folder of audio notes; read-only to the sweep.out_dir— transcripts land here, mirroring the source tree.root— overrides the state root (default$ZOS_HOME, i.e.~/.zos); the default store is aJsonStateStoreat<root>/voice/state.jsonnamespaced by the resolved source dir.extensions— discovered suffixes;DEFAULT_EXTENSIONS = (".mp3", ".m4a", ".wav", ".flac", ".ogg", ".opus", ".aac", ".webm").state— customStateStore; overridesroot.
discover— sorted source-relative POSIX paths of all audio notes (deterministic). RaisesFileNotFoundErrorifsource_diris missing.run— one sweep cycle. Dry-run writes nothing (no dirs, no transcripts, no state, no transcriber calls) and fillsreport.would_process. Otherwise each pending note is processed in isolation: success writes the.mdand adonemarker; failure writes afailedmarker (error + attempts) and the sweep continues.retry_failed— sweep restricted to notes currently markedfailed.status—{source_dir, out_dir, discovered, done, pending, failed: [rel, ...]}; never writes.pendingincludes failed notes (they will be retried).transcript_path—a/b/note.mp3→<out>/a/b/note.md. Distinct notes sharing a stem within the tree would collide; name them distinctly.
SweepReport (dataclass): dry_run, discovered, skipped_done: [rel], would_process: [rel] (dry-run only), processed: [{rel, transcript}], failed: [{rel, error, attempts}]; to_dict() for JSON.
Transcript format
---
source: <rel path>
source_bytes: <int>
duration_seconds: <float> # only when cheaply available (WAV header)
processed_at: <iso8601 UTC>
transcriber: <name>
---
<text>
State — zos_voice.state
A record: {"status": "done"|"failed", "processed_at", "transcript", "error", "attempts"} keyed by source-relative path. Constants: STATUS_DONE, STATUS_FAILED.
JsonStateStore(path, namespace="")— one JSON file for all records, atomic writes (tmp +os.replace).namespace(the sweep passes the resolved source dir) keeps multiple sweeps collision-free in a shared file. Unreadable/corrupt files read as empty — notes would re-process rather than crash.SidecarStateStore(out_dir)— one<rel>.state.jsonmarker per note inside the output tree, so state travels with the transcripts.- Helpers:
zos_home() -> Path($ZOS_HOMEor~/.zos) ·default_state_path(root=None) -> Path(<root or zos_home()>/voice/state.json).
CLI — zos-voice
Subcommands sweep, status, retry-failed; shared flags --root, --marker {state,sidecar}, --extensions; sweep/retry flags --command (required), --command-output {stdout,txt}, --timeout, --max, --dry-run. Prints a JSON report. Exit codes: 0 success · 1 ≥1 note failed · 2 usage error.
# See what WOULD process — writes nothing at all
zos-voice sweep ~/voice-notes ~/voice-transcripts --dry-run \
--command "mlx_whisper {audio} --model mlx-community/whisper-large-v3-turbo --output-format txt --output-dir {workdir}" \
--command-output txt
# Run it for real
zos-voice sweep ~/voice-notes ~/voice-transcripts \
--command "mlx_whisper {audio} --model mlx-community/whisper-large-v3-turbo --output-format txt --output-dir {workdir}" \
--command-output txt
# Where do things stand?
zos-voice status ~/voice-notes ~/voice-transcripts
# Re-attempt only the notes that failed last time
zos-voice retry-failed ~/voice-notes ~/voice-transcripts --command "..." --command-output txt
--command is substituted ({audio} = the audio file, {workdir} = a per-note scratch directory), split with shlex, and executed without a shell.