Skip to main content
Documentation
Integrations

LangChain

Integrate Laghav into any LangChain pipeline using LaghavCallbackHandler (token tracking) and LaghavContextCompressor (RAG document compression).

Installation

bash
pip install laghav[langchain]
# installs: laghav + langchain-core

LaghavCallbackHandler

Drop into any LangChain LLM or chain via the callbacks parameter. Intercepts every LLM call, compresses and routes through Laghav, and records savings metadata.

langchain_basic.py
from langchain.chat_models import ChatAnthropic
from laghav.integrations.langchain import LaghavCallbackHandler
llm = ChatAnthropic(model="claude-opus-4")
handler = LaghavCallbackHandler(
api_key="lgh_live_...",
compress=True,
route=True,
max_aggressiveness=0.6,
)
response = llm.predict("Summarize the revenue report", callbacks=[handler])
# After the call
print(handler.last_meta.saved_usd) # $0.043
print(handler.last_meta.quality_score) # 94
print(handler.last_meta.routing_reason) # "analytical"

Agent tracking (session_summary)

langchain_agent.py
from langchain.agents import AgentExecutor, create_react_agent
from laghav.integrations.langchain import LaghavCallbackHandler
handler = LaghavCallbackHandler(api_key="lgh_live_...")
agent_executor = AgentExecutor(agent=agent, tools=tools, callbacks=[handler])
result = agent_executor.run("Research and summarize Q3 revenue trends")
# After agent completes all steps
summary = handler.session_summary
print(f"Total calls: {summary['total_llm_calls']}")
print(f"Tokens saved: {summary['total_original_tokens'] - summary['total_compressed_tokens']}")
print(f"Total saved ($): {summary['total_saved_usd']:.3f}")
print(f"Avg quality: {summary['avg_quality_score']}")

RAG context compression

Use LaghavContextCompressor to compress retrieved documents before feeding them to the LLM in a RAG pipeline:

langchain_rag.py
from langchain.retrievers import ContextualCompressionRetriever
from laghav.integrations.langchain import LaghavContextCompressor
compressor = LaghavContextCompressor(
api_key="lgh_live_...",
max_tokens_per_chunk=200, # max tokens per document chunk
)
# Wraps any LangChain retriever
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor,
base_retriever=your_retriever,
)
# Now use in a RetrievalQA chain
from langchain.chains import RetrievalQA
chain = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever)
result = chain.run("What caused the revenue decline?")
Savings scale with RAG
RAG pipelines retrieve 5–10 document chunks per query. Laghav compresses each chunk, typically reducing the total context by 60–80% before the generation step. Combined with routing, this often saves 85%+ on RAG query costs.