Agent Architecture using LangGraph Part II

In the previous chapter, we explored agent architecture—the most advanced LLM architecture we’ve encountered so far. It combines chain-of-thought prompting, tool use, and looping into a powerful framework with immense potential.

In this chapter, we’ll explore two extensions that enhance this architecture for specific scenarios:

  • Reflection: Inspired by human thought processes, reflection allows your LLM application to evaluate its previous outputs and decisions, improving future performance.
  • Multi-agent Systems: Just as collaborative teams achieve more than individuals, some challenges are best addressed by groups of LLM agents working together.

Let’s begin with reflection.

Reflection

One prompting technique we haven’t yet explored is reflection (also known as self-critique). Reflection creates a cycle between a creator prompt and a reviser prompt, similar to the iterative human creation process involving authors, reviewers, and editors. This cyclical improvement continues until stakeholders are satisfied with the final product.

Figure 1. System 1 and System 2 thinking
Figure 1. System 1 and System 2 thinking

This method parallels Daniel Kahneman’s concepts of System 1 (reactive) and System 2 (reflective) thinking from Thinking, Fast and Slow. By implementing self-critique, LLM applications can better align with System 2’s more thoughtful processing.

Implementing Reflection as a Graph

We implement reflection as a graph with two nodes: generate and reflect. The task involves writing three-paragraph essays. The generate node creates or revises drafts, while the reflect node critiques them to guide the next revision.

Here’s a simplified Python implementation:

from typing import Annotated, TypedDict
from langchain.messages import AIMessage, BaseMessage, HumanMessage, SystemMessage
from langchain import ChatOpenAI
from langgraph.graph import END, START, StateGraph
from langgraph.graph.message import add

model = ChatOpenAI()

class State(TypedDict):
    messages: Annotated[list[BaseMessage], add]

# Generate node configuration
generate = SystemMessage(
    """You are an essay assistant tasked with writing excellent 3-paragraph essays."""
    "Generate the best essay possible for the user's request."
    """If the user provides critique, respond with a revised version of your previous attempts."""
)

def generate(state: State) -> State:
    answer = model.invoke([generate] + state["messages"])
    return {"messages": [answer]}

# Reflection node configuration
reflection = SystemMessage(
    """You are a teacher grading an essay submission. Generate critique and recommendations for the user's submission."""
    """Provide detailed recommendations, including requests for length, depth, style, etc."""
)

def reflect(state: State) -> State:
    cls = {AIMessage: HumanMessage, HumanMessage: AIMMessage}
    translated = [reflection, state["messages"][0]] + [
        cls[msg.](content=msg.content) for msg in state["messages"][1:]
    ]
    answer = model.invoke(translated)
    return {"messages": [HumanMessage(content=answer.content)]}

# Decision function for loop control
def should(state: State):
    if len(state["messages"]) > 6:
        return END
    else:
        return "reflect"

# Building the graph
builder = StateGraph(State)
builder.add("generate", generate)
builder.add("reflect", reflect)
builder.add(START, "generate")
builder.add("generate", should)
builder.add("reflect", "generate")

graph = builder.compile()

In this architecture, the reflect node prompts the LLM to critique as if evaluating user-generated essays, while the generate node interprets feedback as user critique. This is necessary since dialogue-tuned LLMs are optimized for paired exchanges, not for handling sequences from a single participant.

Potential Variations and Applications

  • Integration with Agent Architecture: Combine reflection with agent architecture by introducing it before final output to improve quality without user intervention.
  • Grounding Critique with External Information: For specific cases like code generation, add a pre-reflect step to run code through a linter for additional input.

Tip: When feasible, try integrating reflection to potentially enhance output quality significantly.

Subgraphs in LangGraph

Before diving into multi-agent architectures, it’s important to understand a key concept in LangGraph: subgraphs. Subgraphs are graphs used as part of another graph, with several use cases:

  • Building multi-agent systems (our next topic)
  • Reusing nodes across multiple parent graphs
  • Enabling collaborative work where different teams can independently work on different graph parts

Methods to Add Subgraphs to Parent Graphs

1. Direct Invocation

Use when the parent graph and subgraph share state keys, with no need for state transformation.

from langgraph.graph import START, StateGraph
from typing import TypedDict

class State(TypedDict):
    foo: str  # Shared key with subgraph

class SubgraphState(TypedDict):
    foo: str  # Shared key with parent graph
    bar: str

# Define subgraph
def subgraph(state: SubgraphState):
    return {"foo": state["foo"] + "bar"}

subgraph = StateGraph(SubgraphState)
subgraph.add(subgraph)
subgraph = subgraph.compile()

# Define parent graph
builder = StateGraph(State)
builder.add("subgraph", subgraph)
graph = builder.compile()

2. Function Invocation

Use when the parent graph and subgraph have different state schemas, requiring transformation.

class State(TypedDict):
    foo: str

class SubgraphState(TypedDict):
    # No shared keys with parent graph
    bar: str
    baz: str

# Define subgraph
def subgraph(state: SubgraphState):
    return {"bar": state["bar"] + "baz"}

subgraph = StateGraph(SubgraphState)
subgraph.add(subgraph)
subgraph = subgraph.compile()

# Define parent graph with transformation
def node(state: State):
    # Transform parent state to subgraph state
    response = subgraph.invoke({"bar": state["foo"]})
    # Transform response back to parent state
    return {"foo": response["bar"]}

builder = StateGraph(State)
builder.add(node)
graph = builder.compile()

Now that we understand subgraphs, let’s explore their significant application in multi-agent architectures.

Multi-Agent Architectures: Collaborative Problem-Solving

As LLM agents increase in size, complexity, or scope, several challenges can impact performance:

  • Tool Overload: Agents with too many tools may struggle to choose the right one.
  • Context Complexity: Prompt size and tracked elements may exceed model capacity.
  • Need for Specialization: Certain tasks benefit from specialized subsystems.

To address these challenges, consider breaking your application into smaller, independent agents and combining them into a multi-agent system.

Figure 2. Multiple strategies for coordinating multiple agents
Figure 2. Multiple strategies for coordinating multiple agents

Strategies for Connecting Agents

  1. Network: Each agent can communicate with every other agent, deciding independently which agent to execute next.
  2. Supervisor: Communication is centralized through a single agent (the supervisor) that decides which agent to call next.
  3. Hierarchical: A supervisor-of-supervisors structure allows for more complex control.
  4. Custom Workflow: Agents only interact with specific others, with some parts of the flow being deterministic.

Supervisor Architecture: A Balanced Approach

The supervisor model strikes a balance between capability and ease of use. Each agent is a graph node, with a supervisor node determining the next step.

Example: Supervisor Node Implementation

from typing import Literal
from langchain import ChatOpenAI
from pydantic import BaseModel

class SupervisorDecision(BaseModel):
    next: Literal["researcher", "coder", "FINISH"]

model = ChatOpenAI(model="gpt-4o", temperature=0)
model = model.with(SupervisorDecision)

agents = ["researcher", "coder"]

system = f"""You are a supervisor managing a conversation between: {agents}. 
Given a user request, decide the next worker. Each will perform a task and 
report results. End with FINISH."""

def supervisor(state):
    messages = [
        ("system", system),
        *state["messages"],
        ("system", system)
    ]
    return model.invoke(messages)

Integrating into a Larger Graph

from langgraph.graph import StateGraph, START

def researcher(state):
    response = model.invoke(...)
    return {"messages": [response]}

def coder(state):
    response = model.invoke(...)
    return {"messages": [response]}

builder = StateGraph(AgentState)
builder.add(supervisor)
builder.add(researcher)
builder.add(coder)

builder.add(START, "supervisor")
builder.add("supervisor", lambda state: state["next"])
builder.add("researcher", "supervisor")
builder.add("coder", "supervisor")

supervisor_graph = builder.compile()

In this example, messages help subagents observe each other’s progress. This modular approach allows for complex structures where each subagent could function as its own graph with internal state.

Summary

In this chapter, we’ve explored two pivotal extensions to the agent architecture: reflection and multi-agent systems. We’ve also seen how subgraphs in LangGraph play an essential role in crafting multi-agent systems.

While these extensions significantly enhance LLM agent capabilities, it’s advisable not to jump straight to them when developing a new agent. The most effective starting point is typically the straightforward architecture discussed in the previous chapter.

In the next chapter, we’ll revisit the critical trade-off between reliability and agency—a fundamental consideration in developing LLM applications today. This discussion is particularly relevant for agent and multi-agent architectures, as their increased power can lead to reduced reliability if not carefully managed. We’ll explore the reasons behind this trade-off and outline essential techniques to help you navigate these choices effectively.