Documentation
Getting Started
Python Quickstart
Get your first Laghav-compressed LLM call running in Python in under 5 minutes. One pip install. One endpoint change.
Step 1 — Install the SDK
bash
pip install laghav
Requires Python 3.8+. The SDK has no mandatory runtime dependencies other than httpx.
Step 2 — Get your API key
Sign up at app.laghav.ai/signup (no credit card). Your key starts with lgh_live_ for production or lgh_test_ for development.
bash
# Set once in your environmentexport LAGHAV_API_KEY=lgh_live_xxxxxxxxxxxx
Step 3 — Make your first call
first_call.py
import osfrom laghav import LaghavClientclient = LaghavClient(api_key=os.environ["LAGHAV_API_KEY"])response = client.complete(messages=[{"role": "user", "content": "Hey could you help me understand what caused the revenue drop last quarter?"}],model="auto", # Laghav picks the cheapest capable model)# Your compressed responseprint(response.choices[0].message.content)# Savings metadata — always includedmeta = response.laghav_metaprint("Original tokens: ", meta.original_tokens) # 847print("Compressed tokens: ", meta.compressed_tokens) # 340print("Compression ratio: ", meta.compression_ratio) # 0.60print("Quality score: ", meta.quality_score) # 94print("Model used: ", meta.model_requested) # claude-haiku-3print("Saved (USD): ", meta.saved_usd) # 0.043
✦Expected output
On your first call you'll see tokens cut by ~60%, quality score near 94/100, and a routing to
claude-haiku-3 for a simple Q&A query — 98% cheaper than Opus.Step 4 — Pass laghav_options
Control exactly what the pipeline does with laghav_options:
with_options.py
response = client.complete(messages=[{"role": "user", "content": prompt}],model="auto",laghav_options={"compress": True, # default True — enable compression"route": True, # default True — enable model routing"cache": True, # default True — enable semantic cache"score": True, # default True — include quality score"max_aggressiveness": 0.7, # 0.0–1.0; higher = more compression"skip_rules": ["intent"], # skip specific compression rules"mask_pii": False, # mask PII before sending to LLM})
Step 5 — Streaming
streaming.py
# Streaming works exactly like OpenAI — SSE chunksfor chunk in client.complete(messages=[{"role": "user", "content": prompt}],model="auto",stream=True,):if chunk.choices[0].delta.content:print(chunk.choices[0].delta.content, end="", flush=True)# Final chunk includes full laghav_metaprint()print(chunk.laghav_meta.saved_usd)
Error handling
errors.py
from laghav import LaghavClientfrom laghav.errors import RateLimitError, BudgetExceededError, LaghavErrorimport timetry:response = client.complete(messages=[...])except RateLimitError as e:time.sleep(e.retry_after)# retryexcept BudgetExceededError as e:print(f"Team budget exceeded: {e.budget_id}")except LaghavError as e:print(f"{e.code}: {e.message}")print(f"Docs: {e.docs_url}")
ℹNext steps
Check out the Python SDK reference for the full API, or jump to Compression to understand all 8 compression rules.