LangChain

Integrate Laghav into any LangChain pipeline using LaghavCallbackHandler (token tracking) and LaghavContextCompressor (RAG document compression).

Installation

bash

pip install laghav[langchain]
# installs: laghav + langchain-core

LaghavCallbackHandler

Drop into any LangChain LLM or chain via the callbacks parameter. Intercepts every LLM call, compresses and routes through Laghav, and records savings metadata.

langchain_basic.py

from langchain.chat_models import ChatAnthropic
from laghav.integrations.langchain import LaghavCallbackHandler
 
llm = ChatAnthropic(model="claude-opus-4")
handler = LaghavCallbackHandler(
    api_key="lgh_live_...",
    compress=True,
    route=True,
    max_aggressiveness=0.6,
)
 
response = llm.predict("Summarize the revenue report", callbacks=[handler])
 
# After the call
print(handler.last_meta.saved_usd)        # $0.043
print(handler.last_meta.quality_score)     # 94
print(handler.last_meta.routing_reason)   # "analytical"

Agent tracking (session_summary)

langchain_agent.py

from langchain.agents import AgentExecutor, create_react_agent
from laghav.integrations.langchain import LaghavCallbackHandler
 
handler = LaghavCallbackHandler(api_key="lgh_live_...")
agent_executor = AgentExecutor(agent=agent, tools=tools, callbacks=[handler])
result = agent_executor.run("Research and summarize Q3 revenue trends")
 
# After agent completes all steps
summary = handler.session_summary
print(f"Total calls:      {summary['total_llm_calls']}")
print(f"Tokens saved:     {summary['total_original_tokens'] - summary['total_compressed_tokens']}")
print(f"Total saved ($):  {summary['total_saved_usd']:.3f}")
print(f"Avg quality:      {summary['avg_quality_score']}")

RAG context compression

Use LaghavContextCompressor to compress retrieved documents before feeding them to the LLM in a RAG pipeline:

langchain_rag.py

from langchain.retrievers import ContextualCompressionRetriever
from laghav.integrations.langchain import LaghavContextCompressor
 
compressor = LaghavContextCompressor(
    api_key="lgh_live_...",
    max_tokens_per_chunk=200,   # max tokens per document chunk
)
 
# Wraps any LangChain retriever
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=your_retriever,
)
 
# Now use in a RetrievalQA chain
from langchain.chains import RetrievalQA
chain = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever)
result = chain.run("What caused the revenue decline?")

✦Savings scale with RAG

RAG pipelines retrieve 5–10 document chunks per query. Laghav compresses each chunk, typically reducing the total context by 60–80% before the generation step. Combined with routing, this often saves 85%+ on RAG query costs.

TypeScript SDK LlamaIndex