Alpha Notice: These docs cover the v1-alpha release. Content is incomplete and subject to change.For the latest stable version, see the v0 LangChain Python or LangChain JavaScript docs.
Middleware provides a way to more tightly control what happens inside the agent.The core agent loop involves calling a model, letting it choose tools to execute, and then finishing when it calls no more tools.
Middleware provides control over what happens before and after those steps.
Each middleware can add in three different types of modifiers:
before_model: Runs before model execution. Can update state or jump to a different node (model, tools, end)
modify_model_request: Runs before model execution, to prepare the model request object. Can only modify the current model request object (no permanent state updates) and cannot jump to a different node.
after_model: Runs after model execution, before tools are executed. Can update state or jump to a different node (model, tools, END)
In addition to that, each middleware can define the following static properties:
name: The name of the middleware (required)
tools: The tools that the middleware makes available to the agent (optional)
state_schema: The schema of the state that the middleware requires (optional)
An agent can contain before_model, modify_model_request, or after_model middleware. All three do not need to be implemented.
Middleware is highly flexible and replaces some other functionality in the agent.
As such, when middleware are used, there are some restrictions on the arguments used to create the agent:
model must be either a string or a BaseChatModel. Will error if a function is passed. If you want to dynamically control the model, use AgentMiddleware.modify_model_request
prompt must be either a string or None. Will error if a function is passed. If you want to dynamically control the prompt, use AgentMiddleware.modify_model_request
pre_model_hook must not be provided. Use AgentMiddleware.before_model instead.
post_model_hook must not be provided. Use AgentMiddleware.after_model instead.
The summarizationMiddleware automatically manages conversation history by summarizing older messages when token limits are approached. This middleware monitors the total token count of messages and creates concise summaries to preserve context while staying within model limits.Key features:
Automatic token counting and threshold monitoring
Intelligent message partitioning that preserves AI/Tool message pairs
Customizable summary prompts and token limits
Use Cases:
Long-running conversations that exceed token limits
Multi-turn dialogues with extensive context
Copy
from langchain.agents import create_agentfrom langchain.agents.middleware import SummarizationMiddlewareagent = create_agent( model="openai:gpt-4o", tools=[weather_tool, calculator_tool], middleware=[ SummarizationMiddleware( model="openai:gpt-4o-mini", max_tokens_before_summary=4000, # Trigger summarization at 4000 tokens messages_to_keep=20, # Keep last 20 messages after summary summary_prompt="Custom prompt for summarization...", # Optional ), ],)
Configuration options:
model: Language model to use for generating summaries (required)
max_tokens_before_summary: Token threshold that triggers summarization
messages_to_keep: Number of recent messages to preserve (default: 20)
token_counter: Custom function for counting tokens (defaults to character-based approximation)
summary_prompt: Custom prompt template for summary generation
summary_prefix: Prefix added to system messages containing summaries (default: ”## Previous conversation summary:”)
The middleware ensures tool call integrity by:
Never splitting AI messages from their corresponding tool responses
Preserving the most recent messages for continuity
Including previous summaries in new summarization cycles
The HumanInTheLoopMiddleware enables human oversight and intervention for tool calls made by the agents. Please
see human-in-the-loop documentation for more details.This middleware intercepts tool executions and allows human operators to approve, modify, reject, or manually respond to tool calls before they execute.
AnthropicPromptCachingMiddleware is a middleware that enables you to enable Anthropic’s native prompt caching.Prompt caching enables optimal API usage by allowing resuming from specific prefixes in your prompts.
This is particularly useful for tasks with repetitive prompts or prompts with redundant information.
Learn more about Anthropic Prompt Caching (strategies, limitations, etc.) here.
When using prompt caching, you’ll likely want to use a checkpointer to store conversation
history across invocations.
Copy
from langchain_anthropic import ChatAnthropicfrom langchain.agents.middleware.prompt_caching import AnthropicPromptCachingMiddlewarefrom langchain.agents import create_agentLONG_PROMPT = """Please be a helpful assistant.<Lots more context ...>"""agent = create_agent( model=ChatAnthropic(model="claude-sonnet-4-latest"), prompt=LONG_PROMPT, middleware=[AnthropicPromptCachingMiddleware(ttl="5m")],)# cache storeagent.invoke({"messages": [HumanMessage("Hi, my name is Bob")]})# cache hit, system prompt is cachedagent.invoke({"messages": [HumanMessage("What's my name?")]})
A system prompt can be dynamically set right before each model invocation using the @modify_model_request decorator. This middleware is particularly useful when the prompt depends on the current agent state or runtime context.For example, you can adjust the system prompt based on the user’s expertise level:
Middleware for agents are subclasses of AgentMiddleware, which implement one or more of its hooks.AgentMiddleware currently provides three different ways to modify the core agent loop:
before_model: runs before the model is run. Can update state or exit early with a jump.
modify_model_request: runs before the model is run. Cannot update state or exit early with a jump.
after_model: runs after the model is run. Can update state or exit early with a jump.
In order to exit early, you can add a jump_to key to the state update with one of the following values:
"model": Jump to the model node
"tools": Jump to the tools node
"end": Jump to the end node
If this is specified, all subsequent middleware will not run.Learn more about exiting early in the agent jumps section.
Runs before the model is run. Can modify state by returning a new state object or state update.Signature:
Copy
from langchain.agents.middleware import AgentMiddleware, AgentStatefrom langchain_core.messages import AIMessageclass MyMiddleware(AgentMiddleware): def before_model(self, state: AgentState) -> dict[str, Any] | None: # terminate early if the conversation is too long if len(state["messages"]) > 50: return { "messages": [AIMessage("I'm sorry, the conversation has been terminated.")], "jump_to": "end" } return state
Runs before the model has run, but after all the before_model calls.These functions cannot modify permanent state or exit early.
Rather, they are intended to modify calls to the model in a stateless way.
If you want to modify calls to the model in a stateful way, you will need to use before_modelModifies the model request. The model request has several key properties:
model (BaseChatModel): the model to use. Note: this needs to the base chat model, not a string.
system_prompt (str): the system prompt to use. Will get prepended to messages
messages (list of messages): the message list. Should not include system prompt.
tool_choice (Any): the tool choice to use
tools (list of strings): the tool names to use for this model call
response_format (ResponseFormat): the response format to use for structured output
Middleware can extend the agent’s state with custom properties, enabling rich data flow between middleware components and ensuring type safety throughout the agent execution.
Middleware can define additional state properties that persist throughout the agent’s execution. These properties become part of the agent’s state and are available to all hooks for said middleware.
Copy
from langchain.agents.middleware import AgentState, AgentMiddlewareclass MyState(AgentState): model_call_count: intclass MyMiddleware(AgentMiddleware[MyState]): state_schema: MyState def before_model(self, state: AgentState) -> dict[str, Any] | None: # terminate early if the model has been called too many times if state["model_call_count"] > 10: return {"jump_to": "end"} return state def after_model(self, state: AgentState) -> dict[str, Any] | None: return {"model_call_count": state["model_call_count"] + 1}
Context properties are configuration values passed through the runnable config. Unlike state, context is read-only and typically used for configuration that doesn’t change during execution.
You can provide multiple middlewares. They are executed in the following logic:before_model: Are run in the order they are passed in. If an earlier middleware exits early, then following middleware are not run
modify_model_request: Are run in the order they are passed in.
after_model: Are run in the reverse order that they are passed in. If an earlier middleware exits early, then following middleware are not run
In order to exit early, you can add a jump_to key to the state update with one of the following values:
"model": Jump to the model node
"tools": Jump to the tools node
"end": Jump to the end node
If this is specified, all subsequent middleware will not run.If you jump to model node, all before_model middleware will run. It’s forbidden to jump to model from an existing before_model middleware.Example usage:
In many applications, you may have a large set of tools, but only a small subset is relevant for a specific request. To optimize performance and accuracy, it’s best to expose only the tools that are needed for each request.Doing so provides several benefits:
Improved accuracy – the model chooses from fewer options.
Permission control – can select tools based on user permissions.
Use middleware to dynamically select which tools are available at runtime based on context.
Copy
from langchain.agents import create_agentfrom langchain.agents.middleware import AgentState, ModelRequest, modify_model_request@modify_model_requestdef tool_selector(state: AgentState, request: ModelRequest) -> ModelRequest: """Middleware to select relevant tools based on state/context.""" # Select a small, relevant subset of tools based on state/context request.tools = ["relevant_tool_1", "relevant_tool_2"] return requestagent = create_agent( ..., tools=all_tools, # All available tools need to be registered upfront # Middleware can be used to select a smaller subset that's relevant for the given # run. middleware=[tool_selector], )
Show Extended example: Select tools based on runtime context
This example shows how to select between GitHub and GitLab tools based on the user’s provider.
Expandable
Copy
from dataclasses import dataclassfrom typing import Literalfrom langchain.agents import create_agentfrom langchain.agents.middleware import AgentState, ModelRequest, modify_model_requestfrom langchain.tools import toolfrom langgraph.runtime import get_runtime@tooldef github_create_issue(repo: str, title: str) -> dict: """Create an issue in a GitHub repository.""" return {"url": f"https://github.com/{repo}/issues/1", "title": title}@tooldef gitlab_create_issue(project: str, title: str) -> dict: """Create an issue in a GitLab project.""" return {"url": f"https://gitlab.com/{project}/-/issues/1", "title": title}all_tools = [github_create_issue, gitlab_create_issue]@dataclassclass Context: provider: Literal["github", "gitlab"]@modify_model_requestdef select_tools(request: ModelRequest, state: AgentState) -> ModelRequest: """Select tools based on the VCS provider.""" runtime = get_runtime(Context) provider = runtime.context.provider selected_tools = ["gitlab_create_issue"] if provider == "gitlab" else ["github_create_issue"] request.tools = selected_tools return requestagent = create_agent( model="openai:gpt-4o", tools=all_tools, middleware=[select_tools], context_schema=Context,)# Invoke with GitHub contextagent.invoke( { "messages": [{ "role": "user", "content": "Open an issue titled 'Bug: where are the cats' in the repository `its-a-cats-game`" }] }, context=Context(provider="github"),)
Key points:
Register all tools with the agent upfront
Use middleware to select the relevant subset per request
Define required context properties using contextSchema
Use context for configuration that doesn’t change during execution
Use state for values that change during the agent run