Documentation
Features
Multi-Turn Conversations
As conversations grow, historical turns bloat token counts exponentially. Laghav summarizes older turns with Haiku-3 and stores a rolling compressed history — reducing token usage by 82%+ on long chats.
How it works
- Pass a stable
conversation_idinlaghav_optionson every turn. - Laghav stores turn history in Redis (
laghav:conv:{tenant_id}:{conversation_id}, TTL 1 hour). - Once turns exceed
max_turns_to_keep(default: 10), older turns are summarized with Haiku-3. - A system instruction prepends the rolling summary:
[System: Context from previous turns: <summary>]. - The last 10 turns remain in raw format for precise context.
Using conversation optimization
conversation.py
import uuidfrom laghav import LaghavClientclient = LaghavClient()conversation_id = str(uuid.uuid4()) # generate once per conversation session# Turn 1r1 = client.complete(messages=[{"role": "user", "content": "Let's discuss Q3 revenue performance."}],model="auto",laghav_options={"conversation_id": conversation_id, "max_turns_to_keep": 10})# Turn 2 — Laghav fetches history from Redis automaticallyr2 = client.complete(messages=[{"role": "user", "content": "What were the main cost drivers?"}],model="auto",laghav_options={"conversation_id": conversation_id})print(r2.laghav_meta.conversation_id) # same conversation_id echoed back# After turn 11: tokens sent to LLM drops by ~80%
Performance benchmarks
| Turns | Without optimization | With optimization | Reduction |
|---|---|---|---|
| 5 turns | ~3,200 tokens | ~3,200 tokens | 0% (below threshold) |
| 10 turns | ~6,400 tokens | ~6,400 tokens | 0% (at threshold) |
| 15 turns | ~9,600 tokens | ~1,650 tokens | 82.7% |
| 25 turns | ~16,000 tokens | ~2,100 tokens | 86.9% |
ℹConversation history is tenant-isolated
Conversation history stored in Redis is always scoped to your
tenant_id. Conversations expire after 1 hour of inactivity (TTL 3600s).