Documentation
Features
Quality Scoring
Before every LLM call, Laghav rates the compressed prompt on a 0–100 scale. If quality falls below your threshold, compression is rolled back automatically.
How scores are computed
The quality scorer uses sentence-transformers cosine similarity between the original and compressed embeddings. A score of 94/100 means the compressed prompt retains 94% of the semantic content of the original. Scores below 80 trigger automatic compression rollback.
| Score range | Meaning | Action |
|---|---|---|
| 95–100 | Excellent — no semantic loss | Proceed with compressed prompt |
| 85–94 | Good — minimal semantic loss | Proceed with compressed prompt |
| 75–84 | Acceptable — some context trimmed | Proceed with compressed prompt |
| < 75 | Poor — significant context lost | Laghav rolls back to original prompt |
Reading the quality score
scoring.py
response = client.complete(messages=messages, model="auto")score = response.laghav_meta.quality_scoreprint(f"Quality score: {score}/100")# Score is always based on the compressed output actually sent# If compression was rolled back, score reflects the uncompressed prompt (100/100)
Disabling scoring
no_score.py
# Skip scoring for ~2ms latency reduction (not recommended for production)response = client.complete(messages=messages,model="auto",laghav_options={"score": False})
⚠Score vs compression tradeoff
Higher
max_aggressiveness values compress more tokens but may lower quality scores. For production, keep aggressiveness at 0.5–0.7 unless you have validated higher values on your specific data.