5. LLM Agents
LLM-based agents (LLM agents) are a new generation of AI applications that leverage large language models (LLMs) in conjunction with modules such as planning, memory, and tool usage to execute complex tasks. In this chapter, we explore the architecture, components, and functionalities of LLM agents, highlighting how they differ from traditional automation systems.
Core Components of LLM Agents
LLM agents are built upon several core modules:
- LLM as the Controller: Acts as the 'brain' of the agent, processing inputs and guiding overall operations.
- Planning Module: Breaks down user requests into structured steps or subtasks to facilitate systematic execution.
- Memory Module: Stores and retrieves past interactions, thoughts, actions, and observations to maintain context.
- Tool Usage Module: Enables the agent to interact with external tools and APIs (e.g., Wikipedia Search API, Code Interpreter, Math Engine) to obtain information or perform specialized tasks.
How LLM Agents Work
LLM agents operate by processing user input to determine the necessary steps for task completion. The process involves:
- Processing Input: The LLM analyzes the user input and determines what needs to be done.
- Context Maintenance: The memory module helps maintain context by storing previous interactions and relevant information.
- Systematic Planning: The planning module decomposes tasks into actionable steps, ensuring that the overall task is broken down into manageable parts.
- Tool Integration: The tool usage module enables the agent to interact with external resources when required.
A high-level schematic of this architecture is shown below:
/%F0%9F%A4%96%20Autonomous%20Software%20Agents/_images/Pasted%20image%2020250327100733.png)
Agents vs. Automation
LLM agents extend beyond simple automation by incorporating decision-making, adaptability, and memory, which allow them to handle complex, dynamic scenarios.
LLM Agents:
- Automate complex workflows efficiently.
- Adapt to various tasks using contextual understanding and advanced reasoning.
- Enhance user interactions by making informed decisions without manual intervention.
- Expand the potential for automation to tasks that traditionally require human insight.
Traditional Automations:
- Follow predefined workflows based on fixed rules.
- Execute repetitive tasks but lack the flexibility to adapt to unexpected scenarios.
- Are limited in handling dynamic or unpredictable environments.
For example:
- Automation Tools (e.g., Zappi, n8n) may successfully reply to routine queries like ASA class times and update FAQs automatically, but they falter when unexpected inputs (e.g., a wrong course name) are encountered.
- LLM Agents can analyze incoming emails to extract intent, reference past interactions and company guidelines, and draft contextually appropriate responses—even generating drafts for review.
Moreover, in complex workflows, multiple specialized agents can work together. Consider an ecosystem where separate agents function as a personal assistant, marketing assistant, finance & accounting agent, or software developer, collaborating to execute comprehensive tasks.
Some notable examples include:
- Open AI Operator
- Open Operator
- Mindpal
- CrewAI
- Lindy
Planning Module – Without Feedback
The planning module is responsible for decomposing user requests into detailed steps or subtasks. Key features include:
- Task Decomposition: It uses an LLM to generate a detailed plan with subtasks, enabling the agent to reason through the problem.
- Techniques for Decomposition:
- Chain of Thought: Single-path reasoning that sequentially processes steps.
- Tree of Thoughts: Multi-path reasoning that explores multiple possible solutions simultaneously.
/%F0%9F%A4%96%20Autonomous%20Software%20Agents/_images/Pasted%20image%2020250327101351.png)
Planning with Feedback
Traditional planning modules often lack mechanisms for feedback, making long-horizon planning difficult. To address this limitation, a reflection mechanism is introduced, allowing the model to iteratively refine its execution plan based on feedback from:
- Past actions and observations.
- External feedback from humans or other models.
This iterative process is essential for complex real-world tasks where trial and error are integral to achieving optimal results. Two popular methods that incorporate feedback are ReAct and Reflexion.
ReAct
ReAct enables an LLM to solve complex tasks by interleaving three steps repeatedly:
- Thought: The agent deliberates about the next step.
- Action: The agent executes an action.
- Observation: The agent receives feedback from the environment, which informs further thoughts.
/%F0%9F%A4%96%20Autonomous%20Software%20Agents/_images/Pasted%20image%2020250327101602.png)
Memory in LLM Agents
Memory plays a critical role in maintaining context and enhancing decision-making:
- Internal Logs: The memory module stores past thoughts, actions, and observations, along with interactions between the agent and users.
- Short-term Memory: Represents the agent's current context, maintained via in-context learning. It is limited by the context window size.
- Long-term Memory: Captures the agent's past behaviors and thoughts over extended periods using external vector stores for efficient retrieval.
- Hybrid Memory: Integrates both short-term and long-term memory, enabling long-range reasoning and the accumulation of experiences.
Memory Formats
Memory can be represented in various formats, including:
- Natural language.
- Embeddings.
- Databases.
- Structured lists.
Some systems, like Ghost in the Minecraft, use a hybrid approach where keys are in natural language and values are embedding vectors, combining the strengths of different formats.
Tool Usage in LLM Agents
The tool usage module empowers agents to interact with external environments and APIs. Examples of tools include:
- Wikipedia Search API
- Code Interpreter
- Math Engine
- Databases and Knowledge Bases
LLM agents leverage tools in diverse ways:
- MRKL Framework: Combines LLMs with expert modules (either LLMs or symbolic processors) to enhance problem-solving.
- Toolformer: Fine-tunes LLMs to utilize external tool APIs effectively.
- Function Calling: Augments LLMs with the capability to use tools by defining a set of APIs that the model can call.
- HuggingGPT: An LLM-powered agent that uses LLMs as a task planner to orchestrate various existing AI models based on their descriptions, thereby solving complex AI tasks.