Skip to main content
Documentation
Getting Started

Python Quickstart

Get your first Laghav-compressed LLM call running in Python in under 5 minutes. One pip install. One endpoint change.

Step 1 — Install the SDK

bash
pip install laghav

Requires Python 3.8+. The SDK has no mandatory runtime dependencies other than httpx.

Step 2 — Get your API key

Sign up at app.laghav.ai/signup (no credit card). Your key starts with lgh_live_ for production or lgh_test_ for development.

bash
# Set once in your environment
export LAGHAV_API_KEY=lgh_live_xxxxxxxxxxxx

Step 3 — Make your first call

first_call.py
import os
from laghav import LaghavClient
client = LaghavClient(api_key=os.environ["LAGHAV_API_KEY"])
response = client.complete(
messages=[{"role": "user", "content": "Hey could you help me understand what caused the revenue drop last quarter?"}],
model="auto", # Laghav picks the cheapest capable model
)
# Your compressed response
print(response.choices[0].message.content)
# Savings metadata — always included
meta = response.laghav_meta
print("Original tokens: ", meta.original_tokens) # 847
print("Compressed tokens: ", meta.compressed_tokens) # 340
print("Compression ratio: ", meta.compression_ratio) # 0.60
print("Quality score: ", meta.quality_score) # 94
print("Model used: ", meta.model_requested) # claude-haiku-3
print("Saved (USD): ", meta.saved_usd) # 0.043
Expected output
On your first call you'll see tokens cut by ~60%, quality score near 94/100, and a routing to claude-haiku-3 for a simple Q&A query — 98% cheaper than Opus.

Step 4 — Pass laghav_options

Control exactly what the pipeline does with laghav_options:

with_options.py
response = client.complete(
messages=[{"role": "user", "content": prompt}],
model="auto",
laghav_options={
"compress": True, # default True — enable compression
"route": True, # default True — enable model routing
"cache": True, # default True — enable semantic cache
"score": True, # default True — include quality score
"max_aggressiveness": 0.7, # 0.0–1.0; higher = more compression
"skip_rules": ["intent"], # skip specific compression rules
"mask_pii": False, # mask PII before sending to LLM
}
)

Step 5 — Streaming

streaming.py
# Streaming works exactly like OpenAI — SSE chunks
for chunk in client.complete(
messages=[{"role": "user", "content": prompt}],
model="auto",
stream=True,
):
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
# Final chunk includes full laghav_meta
print()
print(chunk.laghav_meta.saved_usd)

Error handling

errors.py
from laghav import LaghavClient
from laghav.errors import RateLimitError, BudgetExceededError, LaghavError
import time
try:
response = client.complete(messages=[...])
except RateLimitError as e:
time.sleep(e.retry_after)
# retry
except BudgetExceededError as e:
print(f"Team budget exceeded: {e.budget_id}")
except LaghavError as e:
print(f"{e.code}: {e.message}")
print(f"Docs: {e.docs_url}")
Next steps
Check out the Python SDK reference for the full API, or jump to Compression to understand all 8 compression rules.