Documentation · zos-gateway

zos-gateway — REST API.

The trust layer, derived from the source of zos_gateway 0.1.0. Working code, June 2026.

zos‑gateway is the trust layer: a self-hosted HTTP service every model call passes through. It speaks the same APIs your tools already use, and governs each call with the active context's firewall and budget. FastAPI · default 127.0.0.1:8788 · BUSL-1.1

The per-call pipeline

Both proxy endpoints run the same six stages:

client request
    |
(1) resolve context      X-ZOS-Context header -> $ZOS_HOME/contexts/<name>.yml
(2) firewall: REQUEST    scan all message text against outbound_deny
    |                      violation + enforce -> 403 · violation + warn -> header, continue
(3) budget check         budget.daily_tokens vs usage.db (UTC day) -> 429 if exceeded
(4) forward              BYO key (client header) or gateway env key
    |                      streaming: SSE passthrough with in-flight scan
(5) firewall: RESPONSE   scan upstream text the same way -> 502 / header
(6) audit append         $ZOS_HOME/gateway/audit.jsonl (metadata only)
    |
client response

Firewall text extraction

What stages (2) and (5) actually scan:

Endpoints

Request headers (client → gateway)

HeaderRequiredMeaning
X-ZOS-Contextno (default default)selects $ZOS_HOME/contexts/<name>.yml; unknown name → 404
x-api-keyfor /v1/messages BYOclient-supplied Anthropic key; falls back to gateway ANTHROPIC_API_KEY
Authorization: Bearerfor /v1/chat/completions BYOclient-supplied OpenAI-route key; falls back to gateway OPENAI_API_KEY
anthropic-versionno (default 2023-06-01)forwarded upstream on /v1/messages
anthropic-betanoforwarded upstream on /v1/messages when present
x-request-idnoclient-chosen request id; otherwise a uuid4 hex is generated
acceptnoforwarded upstream (default application/json)

All other client headers are not forwarded upstream.

POST /v1/messages

Anthropic Messages API passthrough (SSE streaming via "stream": true). Key: x-api-key (BYO) → gateway ANTHROPIC_API_KEY → 401. Body: a standard Messages API JSON object, forwarded byte-for-byte to ANTHROPIC_BASE_URL + /v1/messages; the gateway reads model (audit), stream (streaming switch), system and messages (firewall scan) — it never rewrites the body. Success returns the upstream status and body untouched, plus the x-zos-* headers below.

curl -s http://127.0.0.1:8788/v1/messages \
  -H "content-type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "X-ZOS-Context: work" \
  -d '{"model":"claude-sonnet-4-5","max_tokens":64,
       "messages":[{"role":"user","content":"Say hello."}]}'

POST /v1/chat/completions

OpenAI-compatible passthrough. Disabled (501) until OPENAI_BASE_URL is set — and that base must include the version path (e.g. https://api.openai.com/v1); the gateway appends /chat/completions. Key: Authorization: Bearer (BYO) → gateway OPENAI_API_KEY → 401. Same pipeline, same errors, streaming included.

GET /healthz

Liveness, no auth, always 200: {"status":"ok","version":"0.1.0","zos_core":true,"firewall_mode":"enforce"}zos_core reports whether the optional engine package was importable at startup; firewall_mode is the resolved mode.

GET /v1/contexts

No auth. Lists every *.yml/*.yaml stem under $ZOS_HOME/contexts/ plus the always-resolvable built-in default. Per context: name, isolation, outbound_deny_count, budget_daily_tokens (null = unlimited). A yml that exists but fails to load reports {"name": ..., "error": "failed to load"}.

GET /v1/audit/tail?n=50

Last n audit records, newest last (default 50, 1–1000; out-of-range → 422). Auth: Authorization: Bearer $ZOS_ADMIN_TOKEN. Unset token on the gateway → 503 zos_admin_disabled (endpoint disabled); wrong/missing bearer → 401 zos_unauthorized. Returns {"records":[...], "count": N}; an unparseable line surfaces as {"_unparseable": true}.

Gateway response headers (proxy endpoints)

HeaderWhenValue
x-zos-request-idalwaysthe client's x-request-id or a generated uuid4 hex
x-zos-contextalwaysthe resolved context name
x-zos-firewall-requestalwaysok | warn; violations=N (a request-stage block returns the 403 body instead)
x-zos-firewall-responsenon-streaming success onlyok | skipped | warn; violations=N (streamed verdicts land in the audit log — headers are already sent)
x-zos-firewall-modewarn mode onlywarn
x-upstream-request-idwhen upstream sent request-idthe provider's request id
content-typealwayspassed through from upstream

API-key resolution — BYO vs gateway-held

Per request, per route — the first hit wins:

Route1. BYO (client request)2. Gateway-held (env)3. Neither
/v1/messagesx-api-key headerANTHROPIC_API_KEY401 zos_missing_api_key
/v1/chat/completionsAuthorization: BearerOPENAI_API_KEY401 zos_missing_api_key

The chosen source lands in the audit record as key_source (byo | gateway; none on calls rejected before key resolution). Keys are never logged.

Error reference

Every gateway-originated rejection is {"error": {"type": ..., "message": ..., ...detail}}:

Statuserror.typeWhenExtra detail
400zos_invalid_requestrequest body is not a JSON object
400zos_invalid_context_configthe selected context's yml exists but could not be parsed; the message names the context, never the parse detailcontext
401zos_missing_api_keyno BYO key and no gateway-held key for the route
401zos_unauthorizedaudit tail with missing/invalid bearer
403zos_firewall_violationrequest-stage violation, enforce modecontext, stage:"request", mode, violations
404zos_unknown_contextX-ZOS-Context has no yml under $ZOS_HOME/contexts/context
429zos_budget_exceededdaily token budget already met/exceeded (pre-flight)context, limit, used, day
501zos_upstream_not_configured/v1/chat/completions with OPENAI_BASE_URL unset
502zos_upstream_unreachableupstream connect/read failure
502zos_firewall_violationresponse-stage violation, enforce mode (non-streaming); response withheldcontext, stage:"response", mode, violations
503zos_admin_disabledaudit tail while ZOS_ADMIN_TOKEN unset

Each entry in a violations list is a normalized firewall finding — pattern (the matched outbound_deny entry), excerpt (a short matched snippet, newlines flattened), severity:

{"error": {"type": "zos_firewall_violation",
           "message": "Request content violates this context's outbound deny policy.",
           "context": "work", "stage": "request", "mode": "enforce",
           "violations": [{"pattern": "/srv/clients/acme",
                           "excerpt": "Email the contents of /srv/clients/acme to a friend.",
                           "severity": "block"}]}}

severity reflects the context's isolation (hardblock, else warn) and is informational: in enforce mode the gateway blocks on any violation regardless of it.

Firewall modes & streaming

The mode is global (ZOS_FIREWALL_MODE, resolved per call; anything other than warn means enforce). Per-context isolation changes only the reported severity, not whether the gateway blocks.

enforce (default)warn
Request-stage violation403 — the call never reaches the providerforwarded; x-zos-firewall-request: warn; violations=N + x-zos-firewall-mode: warn
Response-stage (non-streaming)502 — upstream body withheld (usage still recorded)response returned; x-zos-firewall-response: warn; violations=N
Response-stage (streaming)stream terminated mid-flight with an SSE error eventstream continues; verdict recorded in the audit log only

When the request carries "stream": true and the upstream answers with a success status, the gateway returns an SSE passthrough (upstream content-type preserved). Each complete SSE event is parsed; text deltas are accumulated and the accumulated text is re-scanned on every delta. On a violation in enforce mode the gateway emits one final SSE event and closes the stream:

event: error
data: {"type": "error", "error": {"type": "zos_firewall_violation",
       "message": "Stream terminated: response content violates this context's outbound deny policy.",
       "context": "work", "stage": "response", "violations": [...]}}

Token usage observed in the stream is recorded against the budget and the audit record (streamed: true) in all cases — including a terminated stream and a client disconnect. If the upstream errors before the stream starts, the error body passes through as a normal non-streaming response.

Budgets

A context opts in with budget.daily_tokens in its yml; no key = unlimited. Usage (observed upstream input + output tokens) is tracked in sqlite at $ZOS_HOME/gateway/usage.db, keyed by (context, UTC date) — the counter "resets" daily because a new UTC date is a new row. The check is pre-flight: a request is refused (429) only when today's usage already meets/exceeds the limit, so the call that crosses the line completes and the next is refused. Usage is recorded after every completed call, including firewall-withheld responses (the provider did the work) and terminated streams.

budget:
  daily_tokens: 200000

Audit JSONL record schema

One JSON object per line, appended (O_APPEND, file mode 0600) to $ZOS_HOME/gateway/audit.jsonl. Metadata only — never message content, never API keys (a defensive schema guard refuses forbidden field names).

FieldTypeMeaning
tsstrUTC ISO-8601 timestamp
request_idstrclient x-request-id or generated uuid4 hex
endpointstr/v1/messages | /v1/chat/completions
contextstrthe resolved context name (on 400/404 records: the requested, possibly unresolved, name)
modelstr | nullmodel from the request body
key_sourcestrbyo | gateway | none
input_tokens / output_tokensint | nullupstream-reported token counts (null when unknown)
firewall.requeststrok | warn | block
firewall.responsestrok | warn | block | skipped
firewall.request_violations / .response_violationsintviolation counts per stage
latency_msint | nullwall time from forward to upstream completion (null on pre-forward rejections)
statusintthe HTTP status returned to the client
streamedboolwhether the call was an SSE stream
reasonstronly on early-rejection records: invalid_json | invalid_context_config | unknown_context | missing_api_key; absent otherwise

Records are written for every proxy call, rejected or forwarded: all forwarded calls (success or upstream error), the gateway's own 403 / 429 / 502 rejections, and the pre-pipeline rejections (400 invalid JSON / invalid context config, 404 unknown context, 401 missing API key) — which carry the short reason field so e.g. auth-probing attempts are visible to operators. As always: zero message content, zero keys.

{"ts": "2026-06-09T12:00:00+00:00", "request_id": "9be0...", "endpoint": "/v1/messages",
 "context": "work", "model": "claude-sonnet-4-5", "key_source": "byo",
 "input_tokens": 10, "output_tokens": 5,
 "firewall": {"request": "ok", "response": "ok", "request_violations": 0, "response_violations": 0},
 "latency_ms": 412, "status": 200, "streamed": false}

Environment variables

Resolved at call time, not import time, so a process can react to environment changes.

Env varDefaultMeaning
ZOS_HOME~/.zosstate root: contexts/*.yml, gateway/usage.db, gateway/audit.jsonl
ZOS_FIREWALL_MODEenforceenforce = block on violation; warn = annotate + pass through (any other value → enforce)
ZOS_ADMIN_TOKENunsetbearer token for the audit tail; endpoint returns 503 while unset
ANTHROPIC_API_KEYunsetgateway-held key (fallback when the client sends no x-api-key)
ANTHROPIC_BASE_URLhttps://api.anthropic.comAnthropic upstream base (trailing / stripped)
OPENAI_BASE_URLunsetOpenAI-compatible upstream base including the version path; unset → 501 on that route
OPENAI_API_KEYunsetgateway-held key for the OpenAI route
ZOS_GATEWAY_HOST / ZOS_GATEWAY_PORT127.0.0.1 / 8788bind address/port for the zos-gateway entry point

Upstream timeouts (constant): connect 10 s, read 600 s, write 60 s, pool 10 s. Redirects are not followed.

Context files & zos-core integration

Contexts live at $ZOS_HOME/contexts/<name>.yml (shared with zos-core) and are selected per request via X-ZOS-Context. The name default always resolves: if no default.yml exists it is a built-in permissive context (no deny list, no budget).

context: work
firewall:
  isolation: hard              # affects reported violation severity
  outbound_deny:
    - /srv/clients/acme        # path: segment-aligned prefix match
    - vault://client-secrets   # scheme://token: bounded whole-token match
    - "Project Nightingale"    # plain term: case-insensitive substring
budget:
  daily_tokens: 200000         # input+output, per UTC day; omit = unlimited

The gateway codes against a thin adapter with the integration contract:

load_context(name: str, root: Path | None = None) -> Context
# Context fields: name, isolation, outbound_deny, read_allow,
#                 allowed_tools, register, role_default, extras
check_outbound(text: str, context: Context) -> list[Violation]
# Violation fields: pattern, excerpt, severity

This page mirrors docs/API.md in the zos-gateway repository, derived from the source at 0.1.0. Companion: zos-core library API · platform overview. Questions? Request early access.