Documentation
API Reference
POST /v1/complete
The primary endpoint. Compresses your prompt, routes to the cheapest capable model, checks the cache, and returns quality scores — all in a single call.
POSThttps://api.laghav.ai/v1/complete
Request body
request.json
{messages: [{role: "system", content: "You are a helpful assistant"},{role: "user", content: "string"}],model: "auto",max_tokens: 1000,stream: false,laghav_options: {compress: true,route: true,cache: true,score: true,budget_id: "engineering",mask_pii: false,protocol_id: "acme-corp-v1",skip_rules: ["intent"],max_aggressiveness: 0.7,conversation_id: "conv_abc123",max_turns_to_keep: 10,agent_run_id: "agent_xyz789"}}
Request fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| messages | array | Yes | — | Array of {role, content} objects. Standard OpenAI-compatible format. |
| model | string | No | "auto" | Target model. Use "auto" for Laghav routing, or a specific model name. |
| max_tokens | integer | No | 1000 | Maximum tokens in the response. |
| stream | boolean | No | false | If true, response is streamed as Server-Sent Events. |
laghav_options.compress | boolean | No | true | Enable prompt compression pipeline. |
laghav_options.route | boolean | No | true | Enable ML model routing. |
laghav_options.cache | boolean | No | true | Enable semantic dedup cache. |
laghav_options.score | boolean | No | true | Include quality score in response. |
laghav_options.budget_id | string | No | — | Team budget to charge this call against. |
laghav_options.mask_pii | boolean | No | false | Enable PII masking via Presidio (Phase 2). |
laghav_options.max_aggressiveness | float | No | 0.5 | Compression level 0.0 (light) to 1.0 (maximum). |
laghav_options.skip_rules | array | No | [] | Compression rule names to skip. See Compression docs. |
laghav_options.conversation_id | string | No | — | Enables multi-turn conversation optimization. |
laghav_options.agent_run_id | string | No | — | Enables agent loop cost tracking and safety guard. |
Supported models
| Model ID | Provider | Tier | Cost / 1M tokens |
|---|---|---|---|
auto | Laghav selects | Auto-routing | Cheapest capable |
claude-haiku-3 | Anthropic | Cheapest | $0.25 |
claude-sonnet-4 | Anthropic | Balanced | $3.00 |
claude-opus-4 | Anthropic | Most capable | $15.00 |
gpt-4o-mini | OpenAI | Cheapest | $0.15 |
gpt-4o | OpenAI | Balanced | $5.00 |
gemini-1.5-flash | Cheapest | $0.075 | |
gemini-1.5-pro | Balanced | $3.50 |
Response (200)
response.json
{id: "lgh_req_abc123",object: "chat.completion",created: 1717257600,choices: [{index: 0,message: {role: "assistant",content: "Here is the analysis..."},finish_reason: "stop"}],model: "claude-haiku-3-20240307",laghav_meta: {original_tokens: 847,compressed_tokens: 340,compression_ratio: 0.60,quality_score: 94,cost_original_usd: 0.000212,cost_actual_usd: 0.000085,saved_usd: 0.000127,routing_reason: "faq_pattern",model_requested: "auto",rules_applied: ["filler", "preamble"],cache_hit: false,pii_masked: false,latency_overhead_ms: 18,conversation_id: "conv_abc123"}}
laghav_meta fields
| Field | Type | Description |
|---|---|---|
| original_tokens | integer | Token count before compression |
| compressed_tokens | integer | Token count after compression |
| compression_ratio | float | Fraction of tokens removed (e.g. 0.60 = 60% removed) |
| quality_score | integer | 0–100 semantic similarity score of compressed vs original |
| cost_original_usd | float | What this call would have cost without Laghav |
| cost_actual_usd | float | What you actually paid |
| saved_usd | float | cost_original_usd - cost_actual_usd |
| routing_reason | string | Why this model was selected (e.g. 'faq_pattern', 'code_task') |
| model_requested | string | The model field you sent (often 'auto') |
| rules_applied | array | Compression rules that modified this prompt |
| cache_hit | boolean | true if response was served from semantic cache |
| pii_masked | boolean | true if PII was detected and masked |
| latency_overhead_ms | integer | Milliseconds Laghav added to total latency |
Streaming
Set "stream": true to receive Server-Sent Events. Each chunk is a partial choices.delta. The final chunk carries the full laghav_meta.
bash
curl -X POST https://api.laghav.ai/v1/complete \-H "Authorization: Bearer lgh_live_xxx" \-H "Content-Type: application/json" \--no-buffer \-d '{"messages":[...],"model":"auto","stream":true}'# data: {"choices":[{"index":0,"delta":{"content":"Here"},"finish_reason":null}]}# data: {"choices":[{"index":0,"delta":{"content":" is"},"finish_reason":null}]}# data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"laghav_meta":{...}}# data: [DONE]
ℹOpenAI compatible
The response schema is a superset of the OpenAI Chat Completions format. Any library that works with OpenAI (LangChain, LlamaIndex, LiteLLM) works with Laghav with only the base URL and API key changed.