Abstract:The applications of large language models (LLMs) have expanded well beyond the confines of text processing, signaling a new era where LLMs are envisioned as generalist agents capable of operating within complex environments. These environments are often highly expansive, making it impossible for the LLM to process them within its short-term memory. Motivated by recent research on extending the capabilities of LLMs with tools, we seek to investigate the intriguing potential of tools to augment LLMs in handling such complexity by introducing a novel class of tools, termed middleware, to aid in the proactive exploration within these massive environments. Such specialized tools can serve as a middleware layer shielding the LLM from environmental complexity. In two representative complex environments -- knowledge bases (KBs) and databases -- we demonstrate the significant potential of augmenting language agents with tools in complex environments. Notably, equipped with the middleware, GPT-4 achieves 2.8X the performance of the best baseline in tasks requiring access to database content and 2.2X in KB tasks. Our findings illuminate the path for advancing language agents in real-world applications.

What problem does this paper attempt to address?

The paper attempts to address the limitations of large language models (LLMs) in handling complex environments. Specifically, LLMs struggle to process complex environments (such as knowledge bases and databases) due to the vast scale of these environments, making it impossible to load all the information into short-term memory at once. Therefore, the authors propose a new framework that enhances the capabilities of LLMs by introducing middleware tools, enabling them to navigate and operate more effectively in these complex environments. ### Main Issues 1. **Handling Complex Environments**: Existing LLMs face challenges when dealing with complex environments, especially when accessing large-scale datasets or knowledge bases. Traditional linearization methods (i.e., converting environment descriptions into a series of discrete tokens) encounter scalability issues when processing large-scale data. 2. **Tool Enhancement**: How to enhance LLMs with tools to enable more effective navigation and operation in complex environments, thereby improving their performance. ### Solutions 1. **Middleware Tools**: The authors designed a set of tools specifically for complex environments, called middleware. These tools act as an intermediary layer between LLMs and the environment, helping LLMs actively explore and acquire necessary information without directly handling all the details of the environment. 2. **Tool Usage Strategies**: To fully leverage the reasoning capabilities of LLMs, the authors proposed two new tool usage strategies: - **Error Feedback**: Providing specific error information when LLMs make mistakes using the tools, guiding LLMs to autonomously correct errors. - **Decoupled Generation**: Separating the reasoning steps of LLMs from tool usage to improve control and accuracy. ### Experimental Results 1. **Database Tasks**: On the BIRD dataset, GPT-4 equipped with middleware tools improved performance by 2.8 times (from 13.8% to 38.3%) in tasks requiring access to database content. 2. **Knowledge Base Tasks**: On the KBQA-AGENT dataset, GPT-4 equipped with middleware tools improved performance by 2.2 times (from 27.1% to 59.3%) in multi-hop reasoning tasks. ### Main Contributions 1. **New Framework**: Developed a new framework to study the role of LLMs in handling complex environments through customized tools. 2. **Comprehensive Evaluation**: Conducted detailed benchmarking of six different LLMs, validating the effectiveness of tool enhancement. 3. **Key Findings**: Demonstrated that tool enhancement significantly improves the performance of LLMs in handling complex environments, providing new possibilities for applying LLMs to real-world applications. In summary, this paper significantly enhances the capabilities and performance of LLMs in handling complex environments by introducing middleware tools and new tool usage strategies.

Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments

STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making

Towards a Middleware for Large Language Models

TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents

TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage

Small LLMs Are Weak Tool Learners: A Multi-LLM Agent

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems

KwaiAgents: Generalized Information-seeking Agent System with Large Language Models

Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information

Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents

MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning

Large Language Model Powered Agents in the Web

Enhancing Pipeline-Based Conversational Agents with Large Language Models

LLM With Tools: A Survey

From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs

Enhancing LLMs for Power System Simulations: A Feedback-driven Multi-agent Framework

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

Navigating Complexity: Orchestrated Problem Solving with Multi-Agent LLMs

Embodied LLM Agents Learn to Cooperate in Organized Teams