Skip to main content
Documentation
Features

Multi-Turn Conversations

As conversations grow, historical turns bloat token counts exponentially. Laghav summarizes older turns with Haiku-3 and stores a rolling compressed history — reducing token usage by 82%+ on long chats.

How it works

  1. Pass a stable conversation_id in laghav_options on every turn.
  2. Laghav stores turn history in Redis (laghav:conv:{tenant_id}:{conversation_id}, TTL 1 hour).
  3. Once turns exceed max_turns_to_keep (default: 10), older turns are summarized with Haiku-3.
  4. A system instruction prepends the rolling summary: [System: Context from previous turns: <summary>].
  5. The last 10 turns remain in raw format for precise context.

Using conversation optimization

conversation.py
import uuid
from laghav import LaghavClient
client = LaghavClient()
conversation_id = str(uuid.uuid4()) # generate once per conversation session
# Turn 1
r1 = client.complete(
messages=[{"role": "user", "content": "Let's discuss Q3 revenue performance."}],
model="auto",
laghav_options={"conversation_id": conversation_id, "max_turns_to_keep": 10}
)
# Turn 2 — Laghav fetches history from Redis automatically
r2 = client.complete(
messages=[{"role": "user", "content": "What were the main cost drivers?"}],
model="auto",
laghav_options={"conversation_id": conversation_id}
)
print(r2.laghav_meta.conversation_id) # same conversation_id echoed back
# After turn 11: tokens sent to LLM drops by ~80%

Performance benchmarks

TurnsWithout optimizationWith optimizationReduction
5 turns~3,200 tokens~3,200 tokens0% (below threshold)
10 turns~6,400 tokens~6,400 tokens0% (at threshold)
15 turns~9,600 tokens~1,650 tokens82.7%
25 turns~16,000 tokens~2,100 tokens86.9%
Conversation history is tenant-isolated
Conversation history stored in Redis is always scoped to your tenant_id. Conversations expire after 1 hour of inactivity (TTL 3600s).