Abstract:The emergence of large language models (LLMs) has opened up unprecedented possibilities for automating complex tasks that are often comparable to human performance. Despite their capabilities, LLMs still encounter difficulties in completing tasks that require high levels of accuracy and complexity due to their inherent limitations in handling multifaceted problems single-handedly. This paper introduces `Smurfs', a cutting-edge multi-agent framework designed to revolutionize the application of LLMs. By seamlessly transforming a conventional LLM into a synergistic multi-agent ensemble, Smurfs can enhance the model's ability to solve complex tasks at no additional cost. This is achieved through innovative prompting strategies that allocate distinct roles within the model, thereby facilitating collaboration among specialized agents and forming an intelligent multi-agent system. Our empirical investigation on both open-ended task of StableToolBench and closed-ended task on HotpotQA showcases Smurfs' superior capability in intricate tool utilization scenarios. Notably, Smurfs outmatches all the baseline methods in both experiments, setting new state-of-the-art performance. Furthermore, through comprehensive ablation studies, we dissect the contribution of the core components of the multi-agent framework to its overall efficacy. This not only verifies the effectiveness of the framework, but also sets a route for future exploration of multi-agent LLM systems.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve The paper aims to address the limitations faced by large language models (LLMs) when handling complex tasks that require high precision, adaptability, and comprehensive knowledge integration. Although LLMs have been able to automate many complex tasks comparable to human performance, they still struggle with handling multifaceted problems single-handedly. The paper proposes a multi-agent framework called "Smurfs," which enhances the model's ability to solve complex tasks by transforming traditional LLMs into a collaborative multi-agent ensemble. ### Specific Problems and Solutions 1. **Multi-Tool Planning Challenges**: - **Effective Solution Planning**: Existing methods like ReACT and DFSDT face challenges in effective solution planning when dealing with multi-tool planning. - **Adaptability to New Tools**: LLMs find it difficult to quickly adapt to new tools when solving problems using multiple tools. 2. **Limitations of Existing Methods**: - **ReACT**: Although it proposes a think-act-observe format, it still has limitations in multi-tool planning. - **DFSDT**: Despite performing well in multi-tool planning, it has issues such as unstable rollback mechanisms, context redundancy, and premature termination. 3. **Innovations of the Smurfs Framework**: - **Multi-Agent System (MAS)**: By dividing tasks and collaborating, each agent focuses on specific subtasks, reducing context redundancy and improving task execution accuracy and output quality. - **Improved Rollback Mechanism**: Introduces a rule-based rollback mechanism to ensure the correctness of depth-first search, enabling even less capable models to effectively use DFSDT for tool planning. - **Combination of Macro and Micro Planning**: Uses task decomposition for macro planning and DFSDT for solving each subtask, avoiding premature termination issues. ### Experimental Validation 1. **Open Task: StableToolBench**: - Evaluation metrics include pass rate and win rate. - Experimental results show that Smurfs achieved the best or near-best performance on multiple LLMs, particularly excelling on the untrained Mistral-7B. 2. **Closed Task: HotpotQA**: - Evaluation metric is the F1 score. - Smurfs, even without training, not only outperformed other untrained agents but also, in some cases, surpassed trained agents, demonstrating its strong generalization ability and efficiency. ### Contribution Summary 1. **Proposed a novel plug-and-play multi-agent system framework**. Experiments show that this method is not only effective but also more cost-efficient than existing tool planning methods. 2. **Revealed the effectiveness of the multi-agent system framework through ablation studies**, providing valuable insights for future research. ### Conclusion The Smurfs framework significantly enhances the performance of LLMs in multi-tool planning tasks through the collaborative work of a multi-agent system, addressing the limitations of existing methods and laying the foundation for future research in multi-agent systems.

Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning

Small LLMs Are Weak Tool Learners: A Multi-LLM Agent

SciAgent: Tool-augmented Language Models for Scientific Reasoning

SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents

TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems

MobileExperts: A Dynamic Tool-Enabled Agent Team in Mobile Devices

LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data

Adaptive In-conversation Team Building for Language Model Agents

Learning to Use Tools via Cooperative and Interactive Agents

Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios

A New Approach to Solving SMAC Task: Generating Decision Tree Code from Large Language Models

Enhancing LLMs for Power System Simulations: A Feedback-driven Multi-agent Framework

Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments

STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making

AgentBench: Evaluating LLMs as Agents

AgentSquare: Automatic LLM Agent Search in Modular Design Space

TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage

ProAgent: Building Proactive Cooperative Agents with Large Language Models