perathos
docs/quickstart

getting started

Quickstart

Perathos is a drop-in proxy. You change one line of code — your SDK's base_url — and every subsequent API call is intercepted, verified, and returned with a signed verdict. Integration takes under 10 minutes.

1

Get your API key

Request access from the demo form — we provision API keys during onboarding. Your Perathos API key is separate from your LLM provider key. Keep both.

PERATHOS_API_KEY=pk_live_...
OPENAI_API_KEY=sk-...  # or your existing LLM provider key
2

Register your LLM provider key (one-time)

Register your LLM provider key once via the providers endpoint. Key storage, logging controls, and provider options are confirmed for the deployment before production use. Multiple customer-approved providers can be registered and selected per request via the model field.

curl -X POST https://api.perathos.com/v1/providers \
  -H "Authorization: Bearer $PERATHOS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "api_key": "sk-...",
    "default_model": "gpt-4o"
  }'
3

Change one line of code

Point your SDK at the Perathos proxy. Perathos routes to the registered provider for the requested model — no provider key in the request.

Python (openai SDK)

import openai

client = openai.OpenAI(
    api_key=os.environ["PERATHOS_API_KEY"],
    base_url="https://api.perathos.com/v1",
)

# All existing code below unchanged
response = client.chat.completions.create(
    model="gpt-4o",   # Perathos routes to the registered provider for this model
    messages=[{"role": "user", "content": "What is the Basel III capital ratio?"}],
)

Node.js (openai SDK)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.PERATHOS_API_KEY,
  baseURL: "https://api.perathos.com/v1",
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "What is the Basel III capital ratio?" }],
});

Per-request provider-key forwarding via X-LLM-Provider-Key is supported as a transitional fallback (for example, when you cannot pre-register a key). Server-side registration is the recommended pattern for production deployments.

4

Read the verdict headers

Every response includes four Perathos headers. The response body is identical to what your LLM provider would have returned.

x-vrl-verdict: PASS          # PASS | FLAG | BLOCK
x-vrl-model: openai/gpt-4o   # fingerprinted source model
x-vrl-bundle-id: bndl_...    # retrieve full proof bundle with this ID
x-vrl-confidence: 0.94       # aggregated confidence score (0.0–1.0)
x-vrl-latency-ms: 1240       # verification latency added

Read the headers from the response object:

# Python
verdict = response.headers["x-vrl-verdict"]
bundle_id = response.headers["x-vrl-bundle-id"]
confidence = float(response.headers["x-vrl-confidence"])

if verdict == "BLOCK":
    # response.choices[0].message.content contains the signed block explanation
    raise ValueError(f"Response blocked by Perathos: {bundle_id}")
5

Retrieve the full Proof Bundle

The VRL Proof Bundle contains the full evidence trail: verifier scores, extracted claims, timestamps, findings, and configured integrity metadata. Retrieve it by bundle ID.

import httpx

bundle = httpx.get(
    f"https://api.perathos.com/v1/bundles/{bundle_id}",
    headers={"Authorization": f"Bearer {os.environ['PERATHOS_API_KEY']}"},
).json()

print(bundle["verdict"])            # "PASS"
print(bundle["confidence_score"])   # 0.94
print(bundle["verifiers"])          # list of 7 verifier results
6

Handle BLOCK responses

When a response is BLOCKed, the content is replaced with a signed explanation. The original (blocked) response content is not returned. The Proof Bundle contains the full evidence of why the response was blocked.

response = client.chat.completions.create(...)

verdict = response.headers.get("x-vrl-verdict")
content = response.choices[0].message.content

if verdict == "BLOCK":
    bundle_id = response.headers["x-vrl-bundle-id"]
    # content = signed explanation of the block reason
    # Log bundle_id for audit trail
    log.warning("AI response blocked", bundle_id=bundle_id)
    return {"error": "Response failed verification", "audit_id": bundle_id}

elif verdict == "FLAG":
    # Response passed but confidence < 0.80
    # Surface for human review or log for audit
    log.info("Response flagged for review", confidence=response.headers["x-vrl-confidence"])

# verdict == "PASS": proceed normally
7

Configure verdict thresholds (optional)

Default thresholds: PASS ≥ 0.80, FLAG 0.50–0.80, BLOCK < 0.50. Override per request or globally in your dashboard.

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    extra_headers={
        # Stricter thresholds for high-stakes queries
        "X-VRL-Flag-Below": "0.90",
        "X-VRL-Block-Below": "0.70",
    },
)

Streaming completions

Perathos must observe the complete LLM response to issue a verdict. Streaming is supported in two modes; choose based on whether you need the verdict before the response reaches the user, or only for audit.

Mode A — Buffered verification

Recommended for regulated or gating workflows. Perathos buffers the LLM stream, runs verification, then streams the response to the client with the verdict header set. Adds full-pipeline latency (800ms–2s) before the first token reaches the user. Use when the verdict must gate delivery — a BLOCK must prevent the user from seeing the response.

# Python (openai SDK) — buffered (default)
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "..."}],
    stream=True,
    extra_headers={"X-VRL-Stream-Mode": "buffered"},
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")
# Verdict header is set on the response before any chunk is emitted
verdict = stream.response.headers["x-vrl-verdict"]

Mode B — Pass-through streaming

Recommended for chat UX where time-to-first-token matters. Perathos streams the response in real time and runs verification in parallel. The verdict is delivered as an HTTP trailer at stream end: x-vrl-verdict-deferred: <bundle_id> — the verdict is retrievable via the bundle endpoint once the trailer arrives. Suitable for audit but not gating — the response has already reached the user by the time verification completes.

# Python (httpx) — pass-through with trailers
import httpx

with httpx.stream(
    "POST",
    "https://api.perathos.com/v1/chat/completions",
    headers={
        "Authorization": f"Bearer {os.environ['PERATHOS_API_KEY']}",
        "X-VRL-Stream-Mode": "passthrough",
    },
    json={"model": "gpt-4o", "messages": [...], "stream": True},
) as r:
    for line in r.iter_lines():
        print(line)
    # Trailers available after the stream ends
    bundle_id = r.trailers.get("x-vrl-verdict-deferred")
    bundle = httpx.get(
        f"https://api.perathos.com/v1/bundles/{bundle_id}",
        headers={"Authorization": f"Bearer {os.environ['PERATHOS_API_KEY']}"},
    ).json()

If you require gating in a streaming UX, set stream=False for high-stakes calls — the latency cost is the price of verifying before delivery. For the rest of the conversation, stream normally.

Typed SDKs and reference implementations

The examples above use the OpenAI SDK pointed at the Perathos base URL, which works unchanged. For richer bundle helpers, signature verification, and typed verdict handling, use the open-source VRL Protocol implementations: