Documentation · zos-voice

zos-voice — voice-note ingestion.

Privacy-first voice ingestion, derived from the source of zos_voice 0.1.0. Working code, June 2026.

zos‑voice turns a folder of voice notes into a folder of Markdown transcripts — privately. It sweeps a plain local directory of audio files, transcribes each one with a LOCAL backend you supply (mlx_whisper, whisper.cpp, or any callable), and lands one .md per note, with state markers, partial-success sweeps, retry of failures, and a true dry-run. Python 3.11+ · stdlib only · BUSL-1.1

Status: working code · 49 tests · CI green · BUSL-1.1 · repository private until public launch.

The privacy contract

No function in this package opens a network connection, ever. No bundled model, no hosted API, no telemetry. Subprocesses are only spawned for the command template you configure, with shell=False and a timeout. The test suite enforces this (tests/test_privacy.py): a full sweep runs with socket.socket monkeypatched to raise, and a static guard fails the build if the package ever imports socket/http/urllib/requests.

The input is a plain local folder. If a cloud-sync client (Drive, Dropbox, Syncthing, …) happens to populate that folder, that is entirely the caller's business — zos-voice neither knows nor cares, and never talks to any service on your behalf. What happens to transcripts afterwards (classification, sync, landing into your notes system) is deliberately out of scope — zos-voice stops at the .md on local disk.

Transcribers — `zos_voice.transcriber`

class Transcriber(Protocol): name: str def transcribe(self, audio_path: Path, workdir: Path) -> str: ...

workdir is a fresh per-note scratch directory provided by the sweep. Failures must raise TranscriptionError(RuntimeError) — one note failed; the sweep records it and continues (partial success).

CommandTranscriber(template, output="stdout", timeout=600, name=None)

Wraps any local CLI.

template: Sequence[str] — argv tokens; {audio} and {workdir} are substituted per note. Executed directly — no shell is ever spawned. Must reference {audio}; output="txt" templates must reference {workdir}.
output — "stdout": transcript is the command's stdout. "txt": the command writes <audio-stem>.txt into {workdir} (mlx_whisper style).
timeout — per-note seconds; expiry raises TranscriptionError.
name — defaults to the binary's basename; lands in transcript frontmatter.

CallableTranscriber(fn, name="callable")

Wraps fn(audio_path: Path) -> str (an in-process model, or a test fake). Non-string/empty results and any exception become TranscriptionError.

Example backend	Command template	`output`
mlx_whisper (Apple Silicon)	`mlx_whisper {audio} --model mlx-community/whisper-large-v3-turbo --output-format txt --output-dir {workdir}`	`txt`
whisper.cpp	`whisper-cli -m /path/to/ggml-base.en.bin -f {audio} -np -nt`	`stdout`

Sweep — `zos_voice.sweep`

Sweep(source_dir, out_dir, transcriber, root=None, extensions=DEFAULT_EXTENSIONS, state=None)

source_dir — plain local folder of audio notes; read-only to the sweep.
out_dir — transcripts land here, mirroring the source tree.
root — overrides the state root (default $ZOS_HOME, i.e. ~/.zos); the default store is a JsonStateStore at <root>/voice/state.json namespaced by the resolved source dir.
extensions — discovered suffixes; DEFAULT_EXTENSIONS = (".mp3", ".m4a", ".wav", ".flac", ".ogg", ".opus", ".aac", ".webm").
state — custom StateStore; overrides root.

discover() -> list[str] run(dry_run=False, max_notes=None, only_failed=False) -> SweepReport retry_failed(dry_run=False, max_notes=None) -> SweepReport status() -> dict transcript_path(rel) -> Path

discover — sorted source-relative POSIX paths of all audio notes (deterministic). Raises FileNotFoundError if source_dir is missing.
run — one sweep cycle. Dry-run writes nothing (no dirs, no transcripts, no state, no transcriber calls) and fills report.would_process. Otherwise each pending note is processed in isolation: success writes the .md and a done marker; failure writes a failed marker (error + attempts) and the sweep continues.
retry_failed — sweep restricted to notes currently marked failed.
status — {source_dir, out_dir, discovered, done, pending, failed: [rel, ...]}; never writes. pending includes failed notes (they will be retried).
transcript_path — a/b/note.mp3 → <out>/a/b/note.md. Distinct notes sharing a stem within the tree would collide; name them distinctly.

SweepReport (dataclass): dry_run, discovered, skipped_done: [rel], would_process: [rel] (dry-run only), processed: [{rel, transcript}], failed: [{rel, error, attempts}]; to_dict() for JSON.

Transcript format

---
source: <rel path>
source_bytes: <int>
duration_seconds: <float>     # only when cheaply available (WAV header)
processed_at: <iso8601 UTC>
transcriber: <name>
---

<text>

State — `zos_voice.state`

A record: {"status": "done"|"failed", "processed_at", "transcript", "error", "attempts"} keyed by source-relative path. Constants: STATUS_DONE, STATUS_FAILED.

class StateStore(Protocol): def get(self, rel) -> dict | None: ... def set(self, rel, record): ... def all(self) -> dict[str, dict]: ...

JsonStateStore(path, namespace="") — one JSON file for all records, atomic writes (tmp + os.replace). namespace (the sweep passes the resolved source dir) keeps multiple sweeps collision-free in a shared file. Unreadable/corrupt files read as empty — notes would re-process rather than crash.
SidecarStateStore(out_dir) — one <rel>.state.json marker per note inside the output tree, so state travels with the transcripts.
Helpers: zos_home() -> Path ($ZOS_HOME or ~/.zos) · default_state_path(root=None) -> Path (<root or zos_home()>/voice/state.json).

CLI — `zos-voice`

Subcommands sweep, status, retry-failed; shared flags --root, --marker {state,sidecar}, --extensions; sweep/retry flags --command (required), --command-output {stdout,txt}, --timeout, --max, --dry-run. Prints a JSON report. Exit codes: 0 success · 1 ≥1 note failed · 2 usage error.

# See what WOULD process — writes nothing at all
zos-voice sweep ~/voice-notes ~/voice-transcripts --dry-run \
  --command "mlx_whisper {audio} --model mlx-community/whisper-large-v3-turbo --output-format txt --output-dir {workdir}" \
  --command-output txt

# Run it for real
zos-voice sweep ~/voice-notes ~/voice-transcripts \
  --command "mlx_whisper {audio} --model mlx-community/whisper-large-v3-turbo --output-format txt --output-dir {workdir}" \
  --command-output txt

# Where do things stand?
zos-voice status ~/voice-notes ~/voice-transcripts

# Re-attempt only the notes that failed last time
zos-voice retry-failed ~/voice-notes ~/voice-transcripts --command "..." --command-output txt

--command is substituted ({audio} = the audio file, {workdir} = a per-note scratch directory), split with shlex, and executed without a shell.

This page mirrors docs/API.md in the zos-voice repository, derived from the source at 0.1.0. Companion: platform overview · zos-core library API. Questions? Request early access.

zos-voice — voice-note ingestion.

The privacy contract

Transcribers — zos_voice.transcriber