Multi-Turn Conversations

As conversations grow, historical turns bloat token counts exponentially. Laghav summarizes older turns with Haiku-3 and stores a rolling compressed history — reducing token usage by 82%+ on long chats.

How it works

Pass a stable conversation_id in laghav_options on every turn.
Laghav stores turn history in Redis (laghav:conv:{tenant_id}:{conversation_id}, TTL 1 hour).
Once turns exceed max_turns_to_keep (default: 10), older turns are summarized with Haiku-3.
A system instruction prepends the rolling summary: [System: Context from previous turns: <summary>].
The last 10 turns remain in raw format for precise context.

Using conversation optimization

conversation.py

import uuid
from laghav import LaghavClient
 
client = LaghavClient()
conversation_id = str(uuid.uuid4())  # generate once per conversation session
 
# Turn 1
r1 = client.complete(
    messages=[{"role": "user", "content": "Let's discuss Q3 revenue performance."}],
    model="auto",
    laghav_options={"conversation_id": conversation_id, "max_turns_to_keep": 10}
)
 
# Turn 2 — Laghav fetches history from Redis automatically
r2 = client.complete(
    messages=[{"role": "user", "content": "What were the main cost drivers?"}],
    model="auto",
    laghav_options={"conversation_id": conversation_id}
)
 
print(r2.laghav_meta.conversation_id)   # same conversation_id echoed back
# After turn 11: tokens sent to LLM drops by ~80%

Performance benchmarks

Turns	Without optimization	With optimization	Reduction
5 turns	~3,200 tokens	~3,200 tokens	0% (below threshold)
10 turns	~6,400 tokens	~6,400 tokens	0% (at threshold)
15 turns	~9,600 tokens	~1,650 tokens	82.7%
25 turns	~16,000 tokens	~2,100 tokens	86.9%

ℹConversation history is tenant-isolated

Conversation history stored in Redis is always scoped to your tenant_id. Conversations expire after 1 hour of inactivity (TTL 3600s).

PII Masking Agent Cost Control