Documentation · zos-telemetry

zos-telemetry — measurement layer.

Session-quality telemetry, derived from the source of zos_telemetry 0.1.0. Working code, June 2026.

zos‑telemetry is the "see what works" layer: it measures how well an AI system is actually serving you — corrections per session, cost, friction — and compares two systems head-to-head. It is content-free by construction: the store holds metrics and short detection markers, and there is literally no field that can hold message text. Python 3.11+ · stdlib only · BUSL-1.1

Status: working code · 28 tests · CI green · BUSL-1.1 · repository private until public launch.

Package zos_telemetry 0.1.0. Storage resolves from an explicit root argument → $ZOS_HOME/telemetry → ~/.zos/telemetry (telemetry_root() — it does not create the directory; writers do, lazily).

Key concepts

Corrections per session — the headline effectiveness proxy: how often the user had to correct or redirect the assistant. Measured from USER messages (not the assistant's reply format), so it is identical across the systems being compared. Lower is better.
Content-free markers — raw text is reduced to short pattern-index tokens (corr.strong.2) at capture time and then discarded. Marker length and count are capped so markers cannot smuggle prose. The test suite asserts stored bytes contain no message text.
Conservative, versioned detection — the detector under-counts by design and every stored row records its detector_version; refine patterns over time rather than chase individual false hits.
Directional, not a verdict — small-sample comparisons are explicitly labeled; a clean sweep over a few sessions is a hint, not a conclusion.

Event capture — `zos_telemetry.events`

SessionLog(root=None)

Append-only store under the resolved root, with two month-bucketed sinks:

Sink	Contents
`events-YYYY-MM.jsonl`	one row per event
`sessions-YYYY-MM.jsonl`	one summary row per closed session

append(*, session_id, system, kind, role=None, ts=None, context=None, model=None, tokens_in=None, tokens_out=None, cost_usd=None, markers=None) -> dict — append one event. Keyword-only and schema-closed: there is no parameter through which message content can be stored. ts defaults to now (UTC, ISO-8601 Z). Markers are validated: max 16 per event, max 32 chars each.
append_record(record) -> dict — integration convenience. Rejects content-bearing keys (content, text, message, body, transcript, prompt, completion) and any key outside the schema, then routes through append.
close_session(session_id, ts=None) -> dict | None — fold the session's events into one summary row, exactly once. Idempotent: re-closing returns the existing row. None if the session has no events.
read_events(session_id=None) — all events across every monthly sink (oldest file first), optionally filtered to one session. read_sessions() — all closed-session summary rows.

Writers are fail-loud (the library raises); wrap calls at integration points where the host process must never block. Sink files are chmod 0600 best-effort.

Event schema

{
  "schema": 1,
  "ts": "2026-06-10T18:00:00Z",
  "session_id": "s1",
  "system": "system_a",
  "context": "work",
  "role": "user",
  "kind": "user_message",
  "model": "model-x",
  "tokens_in": 1200,
  "tokens_out": 300,
  "cost_usd": 0.04,
  "markers": ["corr.strong.2"]
}

kind is free-form; three values carry aggregation semantics: user_message (counts as a user turn; correction detection applies), assistant_message (counts as a turn), anything else (e.g. cost_tick) contributes only tokens/cost. markers is the only detection surface — short, content-free tokens; never message text. Closed-session summary rows add: detector_version, turns, user_turns, corrections, events, first_ts, last_ts, duration_s, ts_closed, and the dominant model (by cost, falling back to frequency).

Correction detector — `zos_telemetry.detector`

A "correction" is a user turn that reads as the user correcting or redirecting the assistant. Conservative by design; versioned via DETECTOR_VERSION (currently 1).

is_correction(text: str) -> bool correction_markers(text: str) -> list[str] detect_corrections(events) -> list[dict] corrections_per_session(events) -> dict[str, int]

is_correction — single-turn classification. Strong signals (explicit wrongness/undo language anywhere in the turn) and opener signals (turn begins with "no"/"wait"/"actually"/"nope").
correction_markers — capture-time reduction to content-free tokens (corr.strong.<i> / corr.opener.<i>, pattern indices). Empty list = no signal. Call this, store the result, discard the text.
detect_corrections — the subset of events that are corrections: kind == "user_message" with at least one corr.* marker.
corrections_per_session — correction count per session_id, including zeros for sessions with no detected corrections.

Known limitations (kept, documented): it under-counts; English-only patterns; opener false positives are possible ("No, Tuesday works" answering a question counts — applied symmetrically across systems, comparisons stay fair even when absolute counts drift); questions and approvals containing "right" do not fire; heuristic, not semantic.

Aggregation — `zos_telemetry.aggregate`

Pure functions; no I/O; recomputed whole each call, so re-running never double-counts.

session_summary(events, session_id=None) -> dict per_day(events) -> list[dict] aggregate_sessions(rows) -> dict[str, dict]

session_summary — whole-session metrics: turns, user_turns, corrections, tokens_in/out, cost_usd, duration_s (first→last event timestamp), dominant model.
per_day — token/cost rollups keyed by (day, system, context, model); rows sorted; unparseable timestamps land under day "unknown".
aggregate_sessions — per-system totals and means over closed-session rows: sessions, corrections, cost_usd, turns, tokens_in/out, duration_s, plus corrections_per_session, cost_per_session, turns_per_session.

Comparison — `zos_telemetry.compare`

compare(system_a, system_b, window=None, *, sessions=None, root=None) -> dict to_markdown(report) -> str SMALL_SAMPLE_N = 20

compare — compares closed-session rows (from sessions= if given, else the SessionLog at root). window = trailing days; None = all time. Report keys: system_a / system_b (per-side stats), window_days, sessions_compared, headline (corrections/session, lower is better), directional, note, detector_version, generated_at.
to_markdown — renders the report: headline, warning block when directional, metric table.
SMALL_SAMPLE_N — below this many sessions on either side the report sets directional: true and carries the "DIRECTIONAL, NOT A VERDICT" note.

CLI — `zos-telemetry`

All subcommands accept --root DIR.

zos-telemetry append --session-id S --system NAME --kind KIND
    [--role R] [--context C] [--model M]
    [--tokens-in N] [--tokens-out N] [--cost-usd X] [--ts ISO]
    [--marker TOKEN]... [--mark-stdin]
zos-telemetry close --session-id S          # exit 1 if no events
zos-telemetry sessions                      # JSONL on stdout
zos-telemetry compare SYSTEM_A SYSTEM_B [--window DAYS] [--json]
zos-telemetry --version

--mark-stdin reads the user message on stdin, stores only its correction markers, and discards the text — the supported way to feed the detector from a shell hook without persisting content.

# capture (e.g. from each system's stop hook):
echo "$USER_MESSAGE" | zos-telemetry append \
  --session-id "$SESSION_ID" --system system_a \
  --kind user_message --mark-stdin          # markers stored, text discarded

# close the session into one summary row, then compare:
zos-telemetry close --session-id "$SESSION_ID"
zos-telemetry compare system_a system_b --window 30

Environment variables

Variable	Effect
`ZOS_HOME`	Base directory; storage goes to `$ZOS_HOME/telemetry` when no explicit root is given. Default base: `~/.zos`.

This page mirrors docs/API.md in the zos-telemetry repository, derived from the source at 0.1.0. Companion: platform overview · zos-evals. Questions? Request early access.

zos-telemetry — measurement layer.

Key concepts

Event capture — zos_telemetry.events

Event schema

Correction detector — zos_telemetry.detector

Aggregation — zos_telemetry.aggregate