api reference
API Reference
The Perathos API is OpenAI-compatible. Any SDK that supports a custom base_url will work without modification. All endpoints require a Perathos API key in the Authorization header.
Base URL & Authentication
Base URL: https://api.perathos.com/v1 Authorization: Bearer pk_live_... # Perathos API key in all requests
/v1/chat/completionsDrop-in replacement for the OpenAI Chat Completions endpoint. Request format is identical. Perathos routes the request through the provider configuration registered for your workspace, verifies the response, and returns it with additional headers.
Request headers
Authorization*Bearer {PERATHOS_API_KEY}
Content-Type*application/json
X-LLM-Provider-KeyTransitional header for onboarding only. Production workspaces should use server-side provider configuration rather than sending provider keys per request.
X-LLM-ModelOverride the model identifier forwarded to your LLM provider. Defaults to the model field in the request body.
X-VRL-Flag-BelowOverride the FLAG threshold for this request. Float 0.0–1.0. Default: 0.80.
X-VRL-Block-BelowOverride the BLOCK threshold for this request. Float 0.0–1.0. Default: 0.50.
Request body
Standard OpenAI Chat Completions request body, with an optional perathos extension object.
{
"model": "gpt-4o",
"messages": [
{ "role": "system", "content": "You are a financial analyst." },
{ "role": "user", "content": "Summarise Basel III capital requirements." }
],
"temperature": 0.3,
"max_tokens": 500,
// optional Perathos extension
"perathos": {
"verifiers": ["symbolic_solver", "knowledge_graph"], // enable only specific verifiers
"flag_below": 0.85, // per-request threshold override
"block_below": 0.60,
"domain": "financial_regulations" // hint to knowledge graph verifier
}
}Response headers
x-vrl-verdictPASS | FLAG | BLOCK
x-vrl-modelFingerprinted source model (provider/model-name)
x-vrl-bundle-idVRL Proof Bundle ID. Use to retrieve the full bundle.
x-vrl-confidenceAggregated confidence score, float 0.0–1.0
x-vrl-latency-msVerification latency added by Perathos, in milliseconds
Response body
Identical to the OpenAI Chat Completions response format. For BLOCK verdicts, choices[0].message.content is replaced with a block explanation and verification context.
{
"id": "chatcmpl-...",
"object": "chat.completion",
"model": "gpt-4o",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Basel III requires banks to maintain..." // BLOCK: explanation replaces this
},
"finish_reason": "stop"
}],
"usage": { "prompt_tokens": 28, "completion_tokens": 142, "total_tokens": 170 }
}/v1/bundles/{bundle_id}Retrieve the full VRL Proof Bundle by ID. Returns the bundle fields available for the deployment in scope, including verifier scores, extracted claims, timestamps, and configured proof or signature fields where enabled.
GET https://api.perathos.com/v1/bundles/bndl_01j9xk2m3n4p5q6r7s8t9u0v Authorization: Bearer pk_live_... # Response: 200 OK # Body: VRL Proof Bundle object (see VRL Proof Bundle schema reference)
/v1/bundlesList bundles for your account. Supports pagination and filtering by verdict, date range, and model.
GET https://api.perathos.com/v1/bundles?verdict=BLOCK&limit=50&after=bndl_... Authorization: Bearer pk_live_... # Query parameters: # verdict string Filter by verdict: PASS, FLAG, BLOCK # after string Cursor-based pagination: bundle_id to start after # before string Cursor-based pagination: bundle_id to end before # limit integer Max results to return (default 20, max 100) # created_after ISO 8601 datetime # created_before ISO 8601 datetime
Error Codes
400invalid_requestMalformed request body or missing required fields.
401authentication_errorMissing or invalid Authorization header.
402quota_exceededMonthly call quota exceeded. Upgrade tier or contact sales.
422unprocessable_responseLLM provider returned an unparseable response. Bundle not generated.
429rate_limit_exceededToo many requests. Retry after the value in Retry-After header.
502llm_provider_errorUpstream LLM provider returned an error. Error details forwarded in response body.
503verification_unavailableVerification pipeline temporarily unavailable. Response not returned. Retry with exponential backoff.
Streaming
Streaming is enabled by setting stream: true in the request body. Perathos must observe the complete LLM response to issue a verdict — two modes trade off verdict-before-delivery against time-to-first-token.
Buffered — Perathos buffers the LLM stream, runs verification, then streams the response with the verdict header set. Full-pipeline latency (800ms–2s) before the first token. Use when a BLOCK must gate delivery.
Pass-through — Perathos streams in real time and verifies in parallel. The verdict is delivered as the HTTP trailing header x-vrl-verdict-deferred: <bundle_id> at stream end. Retrieve the verdict via the bundle endpoint. Suitable for audit, not gating — the response has already reached the user by the time verification completes.
# Per-request mode override X-VRL-Stream-Mode: buffered | passthrough # Account default is configurable via the dashboard. # Default for new accounts: buffered. # Pass-through trailing header (only present when X-VRL-Stream-Mode: passthrough) x-vrl-verdict-deferred: bndl_01j9xk2m3n4p5q6r7s8t9u0v
Rate limits
Starter60 RPM, 1,000 RPD
Platform600 RPM, no daily cap (subject to monthly plan volume)
EnterpriseNegotiated per contract, no enforced ceiling outside the contract
Every response includes rate-limit headers:
X-RateLimit-Limit: Current period limit X-RateLimit-Remaining: Calls remaining in the current window X-RateLimit-Reset: Unix epoch when the window resets
On 429 responses:
HTTP/1.1 429 Too Many Requests
Retry-After: 12
{
"error": "rate_limit_exceeded",
"retry_after_seconds": 12
}Rate limits scope per Perathos API key, not per source IP. Multiple keys (for example, per service) get independent limits.
Retries and idempotency
Clients should send an Idempotency-Key request header (UUIDv4) per logical request. When idempotency is enabled for the workspace, identical key + identical payload returns the cached bundle ID within the configured retention window. Identical key + different payload returns a 409 conflict; treat that as a programming error.
Network-level retries are safe with an idempotency key; without one, retried calls run new verifications and are billed separately. Recommended retry strategy: exponential backoff, max 3 attempts, only on 5xx and 429.
# Python — idempotency key per logical request
import uuid
response = client.chat.completions.create(
model="gpt-4o",
messages=[...],
extra_headers={"Idempotency-Key": str(uuid.uuid4())},
)Degraded mode and fail-policy
Verifier-level degradation
If 1–2 of the seven verifiers are temporarily unavailable, the verdict is still issued from the remaining verifiers. The bundle is marked degraded: true and the response carries x-vrl-degraded: true. The confidence score is renormalised over the available verifiers.
Pipeline-level failure
With four or more verifiers down (or an upstream verifier-inference outage), default behaviour is fail-closed: 503 response, no LLM call forwarded, no bundle generated. This is the safe default for regulated workflows: no unverified response leaves the proxy.
Fail-open option
Enterprise customers can opt in to fail-open behaviour per contract. The LLM call is forwarded and the response returned with x-vrl-verdict: UNVERIFIED and x-vrl-degraded: pipeline_unavailable. The customer explicitly accepts the audit gap during pipeline outages. Not available on Starter or Platform tiers.
Proxy availability is independent of verifier availability
If the pipeline is down but the proxy gateway is healthy, fail-open customers still get their LLM responses with an explicit unverified header; the gating decision is theirs.