Laghav Documentation
Everything you need to compress AI prompts, route to the cheapest capable model, and ship in under 10 minutes. One endpoint change. No refactoring.
How Laghav works
Laghav sits between your application and any LLM provider. Every call passes through a six-stage pipeline:
Strips filler words, preambles, and duplicated context using 8 specialized rules + LLMLingua-2.
ML classifier (DistilBERT ONNX) maps complexity → cheapest capable model. FAQ → Haiku. Code → Sonnet.
Semantic vector search on Redis Stack. Identical or similar queries served instantly — zero LLM cost.
Quality scorer rates the compressed prompt 0–100 before the LLM call. You set the minimum threshold.
PII masking (Presidio), team budget caps, audit logs, per-app API keys, and governance protocols.
Real-time savings dashboard by app, model, team, and compression rule. ClickHouse analytics pipeline.
# Before: direct Anthropic callresponse = anthropic.messages.create(model="claude-opus-4",messages=[{"role": "user", "content": prompt}])# After: route through Laghavfrom laghav import LaghavClientclient = LaghavClient(api_key="lgh_live_xxx")response = client.complete(messages=[{"role": "user", "content": prompt}],model="auto" # Laghav picks cheapest capable model)print(response.laghav_meta.compression_ratio) # 0.60print(response.laghav_meta.quality_score) # 94print(response.laghav_meta.saved_usd) # 0.043
Where to go next
Quickstart →
Python, TypeScript, Go, curl — get your first compressed call in 5 minutes.
API Reference →
Full POST /v1/complete spec, all laghav_options fields, and response schema.
Compression →
All 8 compression rules, aggressiveness tuning, skip_rules, and content types.
SDKs →
Python SDK, TypeScript SDK, LangChain, LlamaIndex, and CLI tool.
61%
avg token reduction
94/100
avg quality score
<20ms
latency overhead