Python Quickstart

Get your first Laghav-compressed LLM call running in Python in under 5 minutes. One pip install. One endpoint change.

Step 1 — Install the SDK

bash

pip install laghav

Requires Python 3.8+. The SDK has no mandatory runtime dependencies other than httpx.

Step 2 — Get your API key

Sign up at app.laghav.ai/signup (no credit card). Your key starts with lgh_live_ for production or lgh_test_ for development.

bash

# Set once in your environment
export LAGHAV_API_KEY=lgh_live_xxxxxxxxxxxx

Step 3 — Make your first call

first_call.py

import os
from laghav import LaghavClient
 
client = LaghavClient(api_key=os.environ["LAGHAV_API_KEY"])
 
response = client.complete(
    messages=[{"role": "user", "content": "Hey could you help me understand what caused the revenue drop last quarter?"}],
    model="auto",   # Laghav picks the cheapest capable model
)
 
# Your compressed response
print(response.choices[0].message.content)
 
# Savings metadata — always included
meta = response.laghav_meta
print("Original tokens:   ", meta.original_tokens)    # 847
print("Compressed tokens: ", meta.compressed_tokens)  # 340
print("Compression ratio: ", meta.compression_ratio)  # 0.60
print("Quality score:     ", meta.quality_score)      # 94
print("Model used:        ", meta.model_requested)    # claude-haiku-3
print("Saved (USD):       ", meta.saved_usd)          # 0.043

✦Expected output

On your first call you'll see tokens cut by ~60%, quality score near 94/100, and a routing to claude-haiku-3 for a simple Q&A query — 98% cheaper than Opus.

Step 4 — Pass laghav_options

Control exactly what the pipeline does with laghav_options:

with_options.py

response = client.complete(
    messages=[{"role": "user", "content": prompt}],
    model="auto",
    laghav_options={
        "compress": True,           # default True — enable compression
        "route": True,              # default True — enable model routing
        "cache": True,              # default True — enable semantic cache
        "score": True,              # default True — include quality score
        "max_aggressiveness": 0.7,  # 0.0–1.0; higher = more compression
        "skip_rules": ["intent"],   # skip specific compression rules
        "mask_pii": False,          # mask PII before sending to LLM
    }
)

Step 5 — Streaming

streaming.py

# Streaming works exactly like OpenAI — SSE chunks
for chunk in client.complete(
    messages=[{"role": "user", "content": prompt}],
    model="auto",
    stream=True,
):
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
 
# Final chunk includes full laghav_meta
print()
print(chunk.laghav_meta.saved_usd)

Error handling

errors.py

from laghav import LaghavClient
from laghav.errors import RateLimitError, BudgetExceededError, LaghavError
import time
 
try:
    response = client.complete(messages=[...])
except RateLimitError as e:
    time.sleep(e.retry_after)
    # retry
except BudgetExceededError as e:
    print(f"Team budget exceeded: {e.budget_id}")
except LaghavError as e:
    print(f"{e.code}: {e.message}")
    print(f"Docs: {e.docs_url}")

ℹNext steps

Check out the Python SDK reference for the full API, or jump to Compression to understand all 8 compression rules.

Introduction TypeScript Quickstart