zos-gateway — REST API.
zos‑gateway is the trust layer: a self-hosted HTTP service every model call passes through. It speaks the same APIs your tools already use, and governs each call with the active context's firewall and budget. FastAPI · default 127.0.0.1:8788 · BUSL-1.1
The per-call pipeline
Both proxy endpoints run the same six stages:
client request
|
(1) resolve context X-ZOS-Context header -> $ZOS_HOME/contexts/<name>.yml
(2) firewall: REQUEST scan all message text against outbound_deny
| violation + enforce -> 403 · violation + warn -> header, continue
(3) budget check budget.daily_tokens vs usage.db (UTC day) -> 429 if exceeded
(4) forward BYO key (client header) or gateway env key
| streaming: SSE passthrough with in-flight scan
(5) firewall: RESPONSE scan upstream text the same way -> 502 / header
(6) audit append $ZOS_HOME/gateway/audit.jsonl (metadata only)
|
client response
- The request body must be a JSON object; anything else →
400 zos_invalid_request, before any other stage. - Upstream error responses (status ≥ 400) pass through untouched — no response firewall scan (error payloads carry no model output) — with the gateway's
x-zos-*request-stage headers added. - Upstream unreachable (connect/read failure) →
502 zos_upstream_unreachable. - Audit-append and budget-accounting failures never take the data plane down.
- The request body is forwarded byte-for-byte — the gateway never rewrites it — and client headers are never forwarded except the allowlisted set below, so a client can never smuggle headers to the provider.
Firewall text extraction
What stages (2) and (5) actually scan:
- Anthropic request — the
systemfield plus everymessages[].content: plain strings,{"type":"text"}blocks, and nestedtool_resultcontent, recursively. Response — everycontent[]text block. - OpenAI request — every
messages[].content(same recursive collection). Response — everychoices[].message.content. - Streaming — text deltas, scanned cumulatively as they arrive, so a deny term split across chunks is still caught.
Endpoints
Request headers (client → gateway)
| Header | Required | Meaning |
|---|---|---|
X-ZOS-Context | no (default default) | selects $ZOS_HOME/contexts/<name>.yml; unknown name → 404 |
x-api-key | for /v1/messages BYO | client-supplied Anthropic key; falls back to gateway ANTHROPIC_API_KEY |
Authorization: Bearer | for /v1/chat/completions BYO | client-supplied OpenAI-route key; falls back to gateway OPENAI_API_KEY |
anthropic-version | no (default 2023-06-01) | forwarded upstream on /v1/messages |
anthropic-beta | no | forwarded upstream on /v1/messages when present |
x-request-id | no | client-chosen request id; otherwise a uuid4 hex is generated |
accept | no | forwarded upstream (default application/json) |
All other client headers are not forwarded upstream.
POST /v1/messages
Anthropic Messages API passthrough (SSE streaming via "stream": true). Key: x-api-key (BYO) → gateway ANTHROPIC_API_KEY → 401. Body: a standard Messages API JSON object, forwarded byte-for-byte to ANTHROPIC_BASE_URL + /v1/messages; the gateway reads model (audit), stream (streaming switch), system and messages (firewall scan) — it never rewrites the body. Success returns the upstream status and body untouched, plus the x-zos-* headers below.
curl -s http://127.0.0.1:8788/v1/messages \
-H "content-type: application/json" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "X-ZOS-Context: work" \
-d '{"model":"claude-sonnet-4-5","max_tokens":64,
"messages":[{"role":"user","content":"Say hello."}]}'
POST /v1/chat/completions
OpenAI-compatible passthrough. Disabled (501) until OPENAI_BASE_URL is set — and that base must include the version path (e.g. https://api.openai.com/v1); the gateway appends /chat/completions. Key: Authorization: Bearer (BYO) → gateway OPENAI_API_KEY → 401. Same pipeline, same errors, streaming included.
GET /healthz
Liveness, no auth, always 200: {"status":"ok","version":"0.1.0","zos_core":true,"firewall_mode":"enforce"} — zos_core reports whether the optional engine package was importable at startup; firewall_mode is the resolved mode.
GET /v1/contexts
No auth. Lists every *.yml/*.yaml stem under $ZOS_HOME/contexts/ plus the always-resolvable built-in default. Per context: name, isolation, outbound_deny_count, budget_daily_tokens (null = unlimited). A yml that exists but fails to load reports {"name": ..., "error": "failed to load"}.
GET /v1/audit/tail?n=50
Last n audit records, newest last (default 50, 1–1000; out-of-range → 422). Auth: Authorization: Bearer $ZOS_ADMIN_TOKEN. Unset token on the gateway → 503 zos_admin_disabled (endpoint disabled); wrong/missing bearer → 401 zos_unauthorized. Returns {"records":[...], "count": N}; an unparseable line surfaces as {"_unparseable": true}.
Gateway response headers (proxy endpoints)
| Header | When | Value |
|---|---|---|
x-zos-request-id | always | the client's x-request-id or a generated uuid4 hex |
x-zos-context | always | the resolved context name |
x-zos-firewall-request | always | ok | warn; violations=N (a request-stage block returns the 403 body instead) |
x-zos-firewall-response | non-streaming success only | ok | skipped | warn; violations=N (streamed verdicts land in the audit log — headers are already sent) |
x-zos-firewall-mode | warn mode only | warn |
x-upstream-request-id | when upstream sent request-id | the provider's request id |
content-type | always | passed through from upstream |
API-key resolution — BYO vs gateway-held
Per request, per route — the first hit wins:
| Route | 1. BYO (client request) | 2. Gateway-held (env) | 3. Neither |
|---|---|---|---|
/v1/messages | x-api-key header | ANTHROPIC_API_KEY | 401 zos_missing_api_key |
/v1/chat/completions | Authorization: Bearer | OPENAI_API_KEY | 401 zos_missing_api_key |
The chosen source lands in the audit record as key_source (byo | gateway; none on calls rejected before key resolution). Keys are never logged.
Error reference
Every gateway-originated rejection is {"error": {"type": ..., "message": ..., ...detail}}:
| Status | error.type | When | Extra detail |
|---|---|---|---|
| 400 | zos_invalid_request | request body is not a JSON object | — |
| 400 | zos_invalid_context_config | the selected context's yml exists but could not be parsed; the message names the context, never the parse detail | context |
| 401 | zos_missing_api_key | no BYO key and no gateway-held key for the route | — |
| 401 | zos_unauthorized | audit tail with missing/invalid bearer | — |
| 403 | zos_firewall_violation | request-stage violation, enforce mode | context, stage:"request", mode, violations |
| 404 | zos_unknown_context | X-ZOS-Context has no yml under $ZOS_HOME/contexts/ | context |
| 429 | zos_budget_exceeded | daily token budget already met/exceeded (pre-flight) | context, limit, used, day |
| 501 | zos_upstream_not_configured | /v1/chat/completions with OPENAI_BASE_URL unset | — |
| 502 | zos_upstream_unreachable | upstream connect/read failure | — |
| 502 | zos_firewall_violation | response-stage violation, enforce mode (non-streaming); response withheld | context, stage:"response", mode, violations |
| 503 | zos_admin_disabled | audit tail while ZOS_ADMIN_TOKEN unset | — |
Each entry in a violations list is a normalized firewall finding — pattern (the matched outbound_deny entry), excerpt (a short matched snippet, newlines flattened), severity:
{"error": {"type": "zos_firewall_violation",
"message": "Request content violates this context's outbound deny policy.",
"context": "work", "stage": "request", "mode": "enforce",
"violations": [{"pattern": "/srv/clients/acme",
"excerpt": "Email the contents of /srv/clients/acme to a friend.",
"severity": "block"}]}}
severity reflects the context's isolation (hard → block, else warn) and is informational: in enforce mode the gateway blocks on any violation regardless of it.
Firewall modes & streaming
The mode is global (ZOS_FIREWALL_MODE, resolved per call; anything other than warn means enforce). Per-context isolation changes only the reported severity, not whether the gateway blocks.
| enforce (default) | warn | |
|---|---|---|
| Request-stage violation | 403 — the call never reaches the provider | forwarded; x-zos-firewall-request: warn; violations=N + x-zos-firewall-mode: warn |
| Response-stage (non-streaming) | 502 — upstream body withheld (usage still recorded) | response returned; x-zos-firewall-response: warn; violations=N |
| Response-stage (streaming) | stream terminated mid-flight with an SSE error event | stream continues; verdict recorded in the audit log only |
When the request carries "stream": true and the upstream answers with a success status, the gateway returns an SSE passthrough (upstream content-type preserved). Each complete SSE event is parsed; text deltas are accumulated and the accumulated text is re-scanned on every delta. On a violation in enforce mode the gateway emits one final SSE event and closes the stream:
event: error
data: {"type": "error", "error": {"type": "zos_firewall_violation",
"message": "Stream terminated: response content violates this context's outbound deny policy.",
"context": "work", "stage": "response", "violations": [...]}}
Token usage observed in the stream is recorded against the budget and the audit record (streamed: true) in all cases — including a terminated stream and a client disconnect. If the upstream errors before the stream starts, the error body passes through as a normal non-streaming response.
Budgets
A context opts in with budget.daily_tokens in its yml; no key = unlimited. Usage (observed upstream input + output tokens) is tracked in sqlite at $ZOS_HOME/gateway/usage.db, keyed by (context, UTC date) — the counter "resets" daily because a new UTC date is a new row. The check is pre-flight: a request is refused (429) only when today's usage already meets/exceeds the limit, so the call that crosses the line completes and the next is refused. Usage is recorded after every completed call, including firewall-withheld responses (the provider did the work) and terminated streams.
budget:
daily_tokens: 200000
Audit JSONL record schema
One JSON object per line, appended (O_APPEND, file mode 0600) to $ZOS_HOME/gateway/audit.jsonl. Metadata only — never message content, never API keys (a defensive schema guard refuses forbidden field names).
| Field | Type | Meaning |
|---|---|---|
ts | str | UTC ISO-8601 timestamp |
request_id | str | client x-request-id or generated uuid4 hex |
endpoint | str | /v1/messages | /v1/chat/completions |
context | str | the resolved context name (on 400/404 records: the requested, possibly unresolved, name) |
model | str | null | model from the request body |
key_source | str | byo | gateway | none |
input_tokens / output_tokens | int | null | upstream-reported token counts (null when unknown) |
firewall.request | str | ok | warn | block |
firewall.response | str | ok | warn | block | skipped |
firewall.request_violations / .response_violations | int | violation counts per stage |
latency_ms | int | null | wall time from forward to upstream completion (null on pre-forward rejections) |
status | int | the HTTP status returned to the client |
streamed | bool | whether the call was an SSE stream |
reason | str | only on early-rejection records: invalid_json | invalid_context_config | unknown_context | missing_api_key; absent otherwise |
Records are written for every proxy call, rejected or forwarded: all forwarded calls (success or upstream error), the gateway's own 403 / 429 / 502 rejections, and the pre-pipeline rejections (400 invalid JSON / invalid context config, 404 unknown context, 401 missing API key) — which carry the short reason field so e.g. auth-probing attempts are visible to operators. As always: zero message content, zero keys.
{"ts": "2026-06-09T12:00:00+00:00", "request_id": "9be0...", "endpoint": "/v1/messages",
"context": "work", "model": "claude-sonnet-4-5", "key_source": "byo",
"input_tokens": 10, "output_tokens": 5,
"firewall": {"request": "ok", "response": "ok", "request_violations": 0, "response_violations": 0},
"latency_ms": 412, "status": 200, "streamed": false}
Environment variables
Resolved at call time, not import time, so a process can react to environment changes.
| Env var | Default | Meaning |
|---|---|---|
ZOS_HOME | ~/.zos | state root: contexts/*.yml, gateway/usage.db, gateway/audit.jsonl |
ZOS_FIREWALL_MODE | enforce | enforce = block on violation; warn = annotate + pass through (any other value → enforce) |
ZOS_ADMIN_TOKEN | unset | bearer token for the audit tail; endpoint returns 503 while unset |
ANTHROPIC_API_KEY | unset | gateway-held key (fallback when the client sends no x-api-key) |
ANTHROPIC_BASE_URL | https://api.anthropic.com | Anthropic upstream base (trailing / stripped) |
OPENAI_BASE_URL | unset | OpenAI-compatible upstream base including the version path; unset → 501 on that route |
OPENAI_API_KEY | unset | gateway-held key for the OpenAI route |
ZOS_GATEWAY_HOST / ZOS_GATEWAY_PORT | 127.0.0.1 / 8788 | bind address/port for the zos-gateway entry point |
Upstream timeouts (constant): connect 10 s, read 600 s, write 60 s, pool 10 s. Redirects are not followed.
Context files & zos-core integration
Contexts live at $ZOS_HOME/contexts/<name>.yml (shared with zos-core) and are selected per request via X-ZOS-Context. The name default always resolves: if no default.yml exists it is a built-in permissive context (no deny list, no budget).
context: work
firewall:
isolation: hard # affects reported violation severity
outbound_deny:
- /srv/clients/acme # path: segment-aligned prefix match
- vault://client-secrets # scheme://token: bounded whole-token match
- "Project Nightingale" # plain term: case-insensitive substring
budget:
daily_tokens: 200000 # input+output, per UTC day; omit = unlimited
The gateway codes against a thin adapter with the integration contract:
load_context(name: str, root: Path | None = None) -> Context
# Context fields: name, isolation, outbound_deny, read_allow,
# allowed_tools, register, role_default, extras
check_outbound(text: str, context: Context) -> list[Violation]
# Violation fields: pattern, excerpt, severity
- If the sibling
zos_corepackage is importable it is used (auto-detected, reported by/healthz); otherwise a built-in fallback (YAML loader + the same three matching shapes) keeps the gateway fully functional standalone. budget.daily_tokensis read fromContext.extras["budget"]["daily_tokens"]; non-integer or negative values mean unlimited.- Both loaders default isolation to
openand accept.ymlas well as.yamlcontext files (.ymlwins when both exist). - A context yml that exists but cannot be parsed is a structured
400 zos_invalid_context_configon the proxy endpoints — never a 500 — in both modes, and the rejection is audited. read_allowandallowed_toolsare loaded and exposed on the context but are not enforced by the gateway pipeline today.