Documentation · zos-gateway

zos-gateway — REST API.

The trust layer, derived from the source of zos_gateway 0.1.0. Working code, June 2026.

zos‑gateway is the trust layer: a self-hosted HTTP service every model call passes through. It speaks the same APIs your tools already use, and governs each call with the active context's firewall and budget. FastAPI · default 127.0.0.1:8788 · BUSL-1.1

The per-call pipeline

Both proxy endpoints run the same six stages:

client request
    |
(1) resolve context      X-ZOS-Context header -> $ZOS_HOME/contexts/<name>.yml
(2) firewall: REQUEST    scan all message text against outbound_deny
    |                      violation + enforce -> 403 · violation + warn -> header, continue
(3) budget check         budget.daily_tokens vs usage.db (UTC day) -> 429 if exceeded
(4) forward              BYO key (client header) or gateway env key
    |                      streaming: SSE passthrough with in-flight scan
(5) firewall: RESPONSE   scan upstream text the same way -> 502 / header
(6) audit append         $ZOS_HOME/gateway/audit.jsonl (metadata only)
    |
client response

The request body must be a JSON object; anything else → 400 zos_invalid_request, before any other stage.
Upstream error responses (status ≥ 400) pass through untouched — no response firewall scan (error payloads carry no model output) — with the gateway's x-zos-* request-stage headers added.
Upstream unreachable (connect/read failure) → 502 zos_upstream_unreachable.
Audit-append and budget-accounting failures never take the data plane down.
The request body is forwarded byte-for-byte — the gateway never rewrites it — and client headers are never forwarded except the allowlisted set below, so a client can never smuggle headers to the provider.

Firewall text extraction

What stages (2) and (5) actually scan:

Anthropic request — the system field plus every messages[].content: plain strings, {"type":"text"} blocks, and nested tool_result content, recursively. Response — every content[] text block.
OpenAI request — every messages[].content (same recursive collection). Response — every choices[].message.content.
Streaming — text deltas, scanned cumulatively as they arrive, so a deny term split across chunks is still caught.

Endpoints

Request headers (client → gateway)

Header	Required	Meaning
`X-ZOS-Context`	no (default `default`)	selects `$ZOS_HOME/contexts/<name>.yml`; unknown name → 404
`x-api-key`	for `/v1/messages` BYO	client-supplied Anthropic key; falls back to gateway `ANTHROPIC_API_KEY`
`Authorization: Bearer`	for `/v1/chat/completions` BYO	client-supplied OpenAI-route key; falls back to gateway `OPENAI_API_KEY`
`anthropic-version`	no (default `2023-06-01`)	forwarded upstream on `/v1/messages`
`anthropic-beta`	no	forwarded upstream on `/v1/messages` when present
`x-request-id`	no	client-chosen request id; otherwise a uuid4 hex is generated
`accept`	no	forwarded upstream (default `application/json`)

All other client headers are not forwarded upstream.

POST /v1/messages

Anthropic Messages API passthrough (SSE streaming via "stream": true). Key: x-api-key (BYO) → gateway ANTHROPIC_API_KEY → 401. Body: a standard Messages API JSON object, forwarded byte-for-byte to ANTHROPIC_BASE_URL + /v1/messages; the gateway reads model (audit), stream (streaming switch), system and messages (firewall scan) — it never rewrites the body. Success returns the upstream status and body untouched, plus the x-zos-* headers below.

curl -s http://127.0.0.1:8788/v1/messages \
  -H "content-type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "X-ZOS-Context: work" \
  -d '{"model":"claude-sonnet-4-5","max_tokens":64,
       "messages":[{"role":"user","content":"Say hello."}]}'

POST /v1/chat/completions

OpenAI-compatible passthrough. Disabled (501) until OPENAI_BASE_URL is set — and that base must include the version path (e.g. https://api.openai.com/v1); the gateway appends /chat/completions. Key: Authorization: Bearer (BYO) → gateway OPENAI_API_KEY → 401. Same pipeline, same errors, streaming included.

GET /healthz

Liveness, no auth, always 200: {"status":"ok","version":"0.1.0","zos_core":true,"firewall_mode":"enforce"} — zos_core reports whether the optional engine package was importable at startup; firewall_mode is the resolved mode.

GET /v1/contexts

No auth. Lists every *.yml/*.yaml stem under $ZOS_HOME/contexts/ plus the always-resolvable built-in default. Per context: name, isolation, outbound_deny_count, budget_daily_tokens (null = unlimited). A yml that exists but fails to load reports {"name": ..., "error": "failed to load"}.

GET /v1/audit/tail?n=50

Last n audit records, newest last (default 50, 1–1000; out-of-range → 422). Auth: Authorization: Bearer $ZOS_ADMIN_TOKEN. Unset token on the gateway → 503 zos_admin_disabled (endpoint disabled); wrong/missing bearer → 401 zos_unauthorized. Returns {"records":[...], "count": N}; an unparseable line surfaces as {"_unparseable": true}.

Gateway response headers (proxy endpoints)

Header	When	Value
`x-zos-request-id`	always	the client's `x-request-id` or a generated uuid4 hex
`x-zos-context`	always	the resolved context name
`x-zos-firewall-request`	always	`ok` \| `warn; violations=N` (a request-stage block returns the 403 body instead)
`x-zos-firewall-response`	non-streaming success only	`ok` \| `skipped` \| `warn; violations=N` (streamed verdicts land in the audit log — headers are already sent)
`x-zos-firewall-mode`	warn mode only	`warn`
`x-upstream-request-id`	when upstream sent `request-id`	the provider's request id
`content-type`	always	passed through from upstream

API-key resolution — BYO vs gateway-held

Per request, per route — the first hit wins:

Route	1. BYO (client request)	2. Gateway-held (env)	3. Neither
`/v1/messages`	`x-api-key` header	`ANTHROPIC_API_KEY`	401 `zos_missing_api_key`
`/v1/chat/completions`	`Authorization: Bearer`	`OPENAI_API_KEY`	401 `zos_missing_api_key`

The chosen source lands in the audit record as key_source (byo | gateway; none on calls rejected before key resolution). Keys are never logged.

Error reference

Every gateway-originated rejection is {"error": {"type": ..., "message": ..., ...detail}}:

Status	`error.type`	When	Extra detail
400	`zos_invalid_request`	request body is not a JSON object	—
400	`zos_invalid_context_config`	the selected context's yml exists but could not be parsed; the message names the context, never the parse detail	`context`
401	`zos_missing_api_key`	no BYO key and no gateway-held key for the route	—
401	`zos_unauthorized`	audit tail with missing/invalid bearer	—
403	`zos_firewall_violation`	request-stage violation, enforce mode	`context`, `stage:"request"`, `mode`, `violations`
404	`zos_unknown_context`	`X-ZOS-Context` has no yml under `$ZOS_HOME/contexts/`	`context`
429	`zos_budget_exceeded`	daily token budget already met/exceeded (pre-flight)	`context`, `limit`, `used`, `day`
501	`zos_upstream_not_configured`	`/v1/chat/completions` with `OPENAI_BASE_URL` unset	—
502	`zos_upstream_unreachable`	upstream connect/read failure	—
502	`zos_firewall_violation`	response-stage violation, enforce mode (non-streaming); response withheld	`context`, `stage:"response"`, `mode`, `violations`
503	`zos_admin_disabled`	audit tail while `ZOS_ADMIN_TOKEN` unset	—

Each entry in a violations list is a normalized firewall finding — pattern (the matched outbound_deny entry), excerpt (a short matched snippet, newlines flattened), severity:

{"error": {"type": "zos_firewall_violation",
           "message": "Request content violates this context's outbound deny policy.",
           "context": "work", "stage": "request", "mode": "enforce",
           "violations": [{"pattern": "/srv/clients/acme",
                           "excerpt": "Email the contents of /srv/clients/acme to a friend.",
                           "severity": "block"}]}}

severity reflects the context's isolation (hard → block, else warn) and is informational: in enforce mode the gateway blocks on any violation regardless of it.

Firewall modes & streaming

The mode is global (ZOS_FIREWALL_MODE, resolved per call; anything other than warn means enforce). Per-context isolation changes only the reported severity, not whether the gateway blocks.

	enforce (default)	warn
Request-stage violation	403 — the call never reaches the provider	forwarded; `x-zos-firewall-request: warn; violations=N` + `x-zos-firewall-mode: warn`
Response-stage (non-streaming)	502 — upstream body withheld (usage still recorded)	response returned; `x-zos-firewall-response: warn; violations=N`
Response-stage (streaming)	stream terminated mid-flight with an SSE error event	stream continues; verdict recorded in the audit log only

When the request carries "stream": true and the upstream answers with a success status, the gateway returns an SSE passthrough (upstream content-type preserved). Each complete SSE event is parsed; text deltas are accumulated and the accumulated text is re-scanned on every delta. On a violation in enforce mode the gateway emits one final SSE event and closes the stream:

event: error
data: {"type": "error", "error": {"type": "zos_firewall_violation",
       "message": "Stream terminated: response content violates this context's outbound deny policy.",
       "context": "work", "stage": "response", "violations": [...]}}

Token usage observed in the stream is recorded against the budget and the audit record (streamed: true) in all cases — including a terminated stream and a client disconnect. If the upstream errors before the stream starts, the error body passes through as a normal non-streaming response.

Budgets

A context opts in with budget.daily_tokens in its yml; no key = unlimited. Usage (observed upstream input + output tokens) is tracked in sqlite at $ZOS_HOME/gateway/usage.db, keyed by (context, UTC date) — the counter "resets" daily because a new UTC date is a new row. The check is pre-flight: a request is refused (429) only when today's usage already meets/exceeds the limit, so the call that crosses the line completes and the next is refused. Usage is recorded after every completed call, including firewall-withheld responses (the provider did the work) and terminated streams.

budget:
  daily_tokens: 200000

Audit JSONL record schema

One JSON object per line, appended (O_APPEND, file mode 0600) to $ZOS_HOME/gateway/audit.jsonl. Metadata only — never message content, never API keys (a defensive schema guard refuses forbidden field names).

Field	Type	Meaning
`ts`	str	UTC ISO-8601 timestamp
`request_id`	str	client `x-request-id` or generated uuid4 hex
`endpoint`	str	`/v1/messages` \| `/v1/chat/completions`
`context`	str	the resolved context name (on 400/404 records: the requested, possibly unresolved, name)
`model`	str \| null	`model` from the request body
`key_source`	str	`byo` \| `gateway` \| `none`
`input_tokens` / `output_tokens`	int \| null	upstream-reported token counts (null when unknown)
`firewall.request`	str	`ok` \| `warn` \| `block`
`firewall.response`	str	`ok` \| `warn` \| `block` \| `skipped`
`firewall.request_violations` / `.response_violations`	int	violation counts per stage
`latency_ms`	int \| null	wall time from forward to upstream completion (null on pre-forward rejections)
`status`	int	the HTTP status returned to the client
`streamed`	bool	whether the call was an SSE stream
`reason`	str	only on early-rejection records: `invalid_json` \| `invalid_context_config` \| `unknown_context` \| `missing_api_key`; absent otherwise

Records are written for every proxy call, rejected or forwarded: all forwarded calls (success or upstream error), the gateway's own 403 / 429 / 502 rejections, and the pre-pipeline rejections (400 invalid JSON / invalid context config, 404 unknown context, 401 missing API key) — which carry the short reason field so e.g. auth-probing attempts are visible to operators. As always: zero message content, zero keys.

{"ts": "2026-06-09T12:00:00+00:00", "request_id": "9be0...", "endpoint": "/v1/messages",
 "context": "work", "model": "claude-sonnet-4-5", "key_source": "byo",
 "input_tokens": 10, "output_tokens": 5,
 "firewall": {"request": "ok", "response": "ok", "request_violations": 0, "response_violations": 0},
 "latency_ms": 412, "status": 200, "streamed": false}

Environment variables

Resolved at call time, not import time, so a process can react to environment changes.

Env var	Default	Meaning
`ZOS_HOME`	`~/.zos`	state root: `contexts/*.yml`, `gateway/usage.db`, `gateway/audit.jsonl`
`ZOS_FIREWALL_MODE`	`enforce`	`enforce` = block on violation; `warn` = annotate + pass through (any other value → `enforce`)
`ZOS_ADMIN_TOKEN`	unset	bearer token for the audit tail; endpoint returns 503 while unset
`ANTHROPIC_API_KEY`	unset	gateway-held key (fallback when the client sends no `x-api-key`)
`ANTHROPIC_BASE_URL`	`https://api.anthropic.com`	Anthropic upstream base (trailing `/` stripped)
`OPENAI_BASE_URL`	unset	OpenAI-compatible upstream base including the version path; unset → 501 on that route
`OPENAI_API_KEY`	unset	gateway-held key for the OpenAI route
`ZOS_GATEWAY_HOST` / `ZOS_GATEWAY_PORT`	`127.0.0.1` / `8788`	bind address/port for the `zos-gateway` entry point

Upstream timeouts (constant): connect 10 s, read 600 s, write 60 s, pool 10 s. Redirects are not followed.

Context files & zos-core integration

Contexts live at $ZOS_HOME/contexts/<name>.yml (shared with zos-core) and are selected per request via X-ZOS-Context. The name default always resolves: if no default.yml exists it is a built-in permissive context (no deny list, no budget).

context: work
firewall:
  isolation: hard              # affects reported violation severity
  outbound_deny:
    - /srv/clients/acme        # path: segment-aligned prefix match
    - vault://client-secrets   # scheme://token: bounded whole-token match
    - "Project Nightingale"    # plain term: case-insensitive substring
budget:
  daily_tokens: 200000         # input+output, per UTC day; omit = unlimited

The gateway codes against a thin adapter with the integration contract:

load_context(name: str, root: Path | None = None) -> Context
# Context fields: name, isolation, outbound_deny, read_allow,
#                 allowed_tools, register, role_default, extras
check_outbound(text: str, context: Context) -> list[Violation]
# Violation fields: pattern, excerpt, severity

If the sibling zos_core package is importable it is used (auto-detected, reported by /healthz); otherwise a built-in fallback (YAML loader + the same three matching shapes) keeps the gateway fully functional standalone.
budget.daily_tokens is read from Context.extras["budget"]["daily_tokens"]; non-integer or negative values mean unlimited.
Both loaders default isolation to open and accept .yml as well as .yaml context files (.yml wins when both exist).
A context yml that exists but cannot be parsed is a structured 400 zos_invalid_context_config on the proxy endpoints — never a 500 — in both modes, and the rejection is audited.
read_allow and allowed_tools are loaded and exposed on the context but are not enforced by the gateway pipeline today.

This page mirrors docs/API.md in the zos-gateway repository, derived from the source at 0.1.0. Companion: zos-core library API · platform overview. Questions? Request early access.