Documentation
Integrations
LlamaIndex
Integrate Laghav into LlamaIndex pipelines with LaghavCallbackHandler — automatic compression, routing, and savings tracking on every LLM call.
Installation
bash
pip install laghav[llama-index]# installs: laghav + llama-index-core
Basic usage
llamaindex_basic.py
from llama_index.core import Settings, VectorStoreIndexfrom llama_index.core.callbacks import CallbackManagerfrom laghav.integrations.llama_index import LaghavCallbackHandler# Register Laghav as the global callback handlerhandler = LaghavCallbackHandler(api_key="lgh_live_...",compress=True,route=True,)Settings.callback_manager = CallbackManager([handler])# All LlamaIndex calls are now compressed and routed through Laghavindex = VectorStoreIndex.from_documents(docs)query_engine = index.as_query_engine()result = query_engine.query("What caused the Q3 revenue drop?")
Session summary
llamaindex_summary.py
# After your LlamaIndex session completessummary = handler.session_summaryprint("Total LLM calls: ", summary['total_llm_calls'])print("Total original tokens: ", summary['total_original_tokens'])print("Total compressed tokens: ", summary['total_compressed_tokens'])print("Total saved (USD): ", summary['total_saved_usd'])print("Avg quality score: ", summary['avg_quality_score'])# Example output:# Total LLM calls: 8# Total original tokens: 4840# Total compressed tokens: 1548# Total saved (USD): 0.213# Avg quality score: 94.5
✦Works with any LlamaIndex query engine
The callback hooks into
LLMStartEvent and LLMEndEvent in the LlamaIndex event system. It works with any LLM backend (OpenAI, Anthropic, Ollama) and any query engine type.Per-query settings
llamaindex_options.py
# Override Laghav options per handler instancehandler = LaghavCallbackHandler(api_key="lgh_live_...",compress=True,route=True,max_aggressiveness=0.8, # aggressive for document-heavy queriesskip_rules=["intent"],)