Documentation
Integrations
LangChain
Integrate Laghav into any LangChain pipeline using LaghavCallbackHandler (token tracking) and LaghavContextCompressor (RAG document compression).
Installation
bash
pip install laghav[langchain]# installs: laghav + langchain-core
LaghavCallbackHandler
Drop into any LangChain LLM or chain via the callbacks parameter. Intercepts every LLM call, compresses and routes through Laghav, and records savings metadata.
langchain_basic.py
from langchain.chat_models import ChatAnthropicfrom laghav.integrations.langchain import LaghavCallbackHandlerllm = ChatAnthropic(model="claude-opus-4")handler = LaghavCallbackHandler(api_key="lgh_live_...",compress=True,route=True,max_aggressiveness=0.6,)response = llm.predict("Summarize the revenue report", callbacks=[handler])# After the callprint(handler.last_meta.saved_usd) # $0.043print(handler.last_meta.quality_score) # 94print(handler.last_meta.routing_reason) # "analytical"
Agent tracking (session_summary)
langchain_agent.py
from langchain.agents import AgentExecutor, create_react_agentfrom laghav.integrations.langchain import LaghavCallbackHandlerhandler = LaghavCallbackHandler(api_key="lgh_live_...")agent_executor = AgentExecutor(agent=agent, tools=tools, callbacks=[handler])result = agent_executor.run("Research and summarize Q3 revenue trends")# After agent completes all stepssummary = handler.session_summaryprint(f"Total calls: {summary['total_llm_calls']}")print(f"Tokens saved: {summary['total_original_tokens'] - summary['total_compressed_tokens']}")print(f"Total saved ($): {summary['total_saved_usd']:.3f}")print(f"Avg quality: {summary['avg_quality_score']}")
RAG context compression
Use LaghavContextCompressor to compress retrieved documents before feeding them to the LLM in a RAG pipeline:
langchain_rag.py
from langchain.retrievers import ContextualCompressionRetrieverfrom laghav.integrations.langchain import LaghavContextCompressorcompressor = LaghavContextCompressor(api_key="lgh_live_...",max_tokens_per_chunk=200, # max tokens per document chunk)# Wraps any LangChain retrievercompression_retriever = ContextualCompressionRetriever(base_compressor=compressor,base_retriever=your_retriever,)# Now use in a RetrievalQA chainfrom langchain.chains import RetrievalQAchain = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever)result = chain.run("What caused the revenue decline?")
✦Savings scale with RAG
RAG pipelines retrieve 5–10 document chunks per query. Laghav compresses each chunk, typically reducing the total context by 60–80% before the generation step. Combined with routing, this often saves 85%+ on RAG query costs.