Documentation · zos-loops

zos-loops — loop engine.

The continuous-operation loop engine, derived from the source of zos_loops 0.1.0. Working code, June 2026.

zos‑loops is the runtime that lets an AI system (or any system) run unattended, repeating work on an interval, safely. Unattended loops fail in predictable ways — two copies run at once, a crashed loop leaves a lock behind, a "healthy" heartbeat hides a writer recording stale state, one bad step kills the whole cycle. zos-loops packages the counter-patterns as a small library and CLI. Python 3.11+ · stdlib only · BUSL-1.1

Status: working code · 52 tests · CI green · BUSL-1.1 · repository private until public launch.

All state resolves under <root>/loops/<name>/, where <root> is, in precedence order: an explicit root argument → $ZOS_HOME~/.zos. Per-loop files: loop.lock, heartbeat.json, ticks.jsonl.

Key concepts

Runner — zos_loops.runner

LoopRunner(name, tick_fn=None, tick_cmd=None, steps=None, interval=60.0, root=None, failure_threshold=3, backoff_base=2.0, backoff_cap_s=3600.0, stop_event=None)

Provide exactly ONE of:

ArgTick becomes
tick_fna single step "tick" running the zero-arg callable
tick_cmda single step "tick" running the shell command
stepsthe given Step sequence, sorted by priority (stable)
Step(name, fn=None, cmd=None, priority=0, timeout_s=None)

One named unit of tick work; exactly one of fn / cmd. Lower priority runs first; ties keep declaration order. cmd runs via subprocess (shell=True, output captured, non-zero exit = step failure).

TickResult # tick (int) · status ("ok" | "partial" | "failed" | "disabled") # duration_ms · steps [{name, status, ms[, error_type]}] # consecutive_failures · backoff_s
from zos_loops import LoopRunner, Step

runner = LoopRunner(
    "groomer",
    steps=[
        Step(name="unblock",  fn=process_blocked,   priority=0),  # first
        Step(name="schedule", fn=promote_scheduled, priority=1),
        Step(name="execute",  fn=run_ready_work,    priority=2),
    ],
    interval=300,            # seconds between ticks
    failure_threshold=3,     # whole-tick failures before backoff
)
runner.run()                 # acquires the lock, ticks until externally stopped

Health — zos_loops.health

loop_health(name, root=None, max_age_s=None) -> LoopHealth all_health(root=None, max_age_s=None) -> list[LoopHealth] list_loops(root=None) -> list[str]
StatusMeaning
"stopped"no lock file — nothing claims to run
"running"lock held by a live PID and heartbeat fresh on both axes
"stale"everything else: dead-PID lock (crash without cleanup), or a live holder whose heartbeat is missing / unreadable / mtime-stale / diverged

LoopHealth: name, status, detail (human reason), lock_holder (dict or None), heartbeat (HeartbeatCheck or None).

Heartbeat — zos_loops.heartbeat

write_heartbeat(name, root=None, interval_s=60.0, tick=0, now=None) -> Path check_heartbeat(name, root=None, max_age_s=None, now=None) -> HeartbeatCheck read_heartbeat(name, root=None) -> dict | None

write_heartbeat is an atomic (tmp + rename) write of {"ts": iso8601, "epoch": float, "pid": int, "interval_s": float, "tick": int} — mtime and payload move together on a healthy writer. check_heartbeat is the dual check; max_age_s defaults to the interval_s recorded in the heartbeat × 2.5.

StatusFile mtimeInner tsMeaning
"fresh"freshfreshhealthy
"stale"stalewriter not running
"diverged"freshstalewriter runs but records stale state
"missing" / "unreadable"no usable heartbeat

HeartbeatCheck: status, file_age_s, inner_age_s, max_age_s, payload, property fresh (True only for "fresh").

Lock — zos_loops.lock

acquire(name, root=None, owner_pid=None) -> (bool, str, dict | None) release(name, root=None, owner_pid=None) -> bool read_holder(name, root=None) -> dict | None pid_alive(pid) -> bool · lock_path(name, root=None) -> Path

Tick log — zos_loops.ticklog

ticks.jsonl: one JSON object per tick — {ts, loop, tick, status, duration_ms, steps: [{name, status, ms[, error_type]}], consecutive_failures, backoff_s}. Shape only: no step output, no error messages, no payloads.

append_tick(name, record, root=None) # best-effort, never raises into the runner read_ticks(name, root=None) -> list[dict] # skips corrupt lines ticklog_path(name, root=None) -> Path

Scheduler adapters — zos_loops.schedulers

Template generators — they return text, they never call the OS. Use them so an OS scheduler drives the cadence while the runner provides the safety.

launchd_plist(label, command, interval_s, stdout_path=None, stderr_path=None) -> str systemd_unit(name, command, description=None) -> str systemd_timer(name, interval_s, description=None) -> str # fires <name>.service crontab_line(command, every_minutes) -> str # floors at 1 min; ≥60 min rounds to hourly 0 */H

CLI — zos-loop

zos-loop status  [--root PATH] [--max-age S] [--json]
zos-loop health <name> [--root PATH] [--max-age S] [--json]
    exit code: 0 running · 1 stopped · 2 stale
zos-loop run <name> --cmd CMD [--interval S] [--once | --max-ticks N]
                    [--failure-threshold K] [--root PATH]
zos-loop schedule <name> --cmd CMD --interval S
                  --format launchd|systemd|cron [--label L]

Paths helpers (zos_loops.home): zos_home(root=None) · loops_root(root=None) · loop_dir(name, root=None) · safe_name(name).

This page mirrors docs/API.md in the zos-loops repository, derived from the source at 0.1.0. Companion: platform overview · zos-core library API. Questions? Request early access.