From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs

Yulong Liu,Yunlong Yuan,Chunwei Wang,Jianhua Han,Yongqiang Ma,Li Zhang,Nanning Zheng,Hang Xu
2024-02-28
Abstract:The distinction between humans and animals lies in the unique ability of humans to use and create tools. Tools empower humans to overcome physiological limitations, fostering the creation of magnificent civilizations. Similarly, enabling foundational models like Large Language Models (LLMs) with the capacity to learn external tool usage may serve as a pivotal step toward realizing artificial general intelligence. Previous studies in this field have predominantly pursued two distinct approaches to augment the tool invocation capabilities of LLMs. The first approach emphasizes the construction of relevant datasets for model fine-tuning. The second approach, in contrast, aims to fully exploit the inherent reasoning abilities of LLMs through in-context learning strategies. In this work, we introduce a novel tool invocation pipeline designed to control massive real-world APIs. This pipeline mirrors the human task-solving process, addressing complicated real-life user queries. At each step, we guide LLMs to summarize the achieved results and determine the next course of action. We term this pipeline `from Summary to action', Sum2Act for short. Empirical evaluations of our Sum2Act pipeline on the ToolBench benchmark show significant performance improvements, outperforming established methods like ReAct and DFSDT. This highlights Sum2Act's effectiveness in enhancing LLMs for complex real-world tasks.
Artificial Intelligence,Computation and Language,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to enhance the ability of large - language models (LLMs) to handle complex real - world tasks, especially by means of open - world API calls to achieve this goal. Specifically, the paper introduces a new tool - calling framework named Sum2Act, which aims to control large - scale real - world APIs to solve complex user queries. This framework mimics the process by which humans solve problems, guiding LLMs at each step to summarize the results achieved and decide on the next action. In this way, Sum2Act can effectively improve the performance of LLMs in handling complex real - world tasks. ### Main Contributions 1. **Introduction of a new tool - calling framework**: This framework includes a router and a state manager, enabling large - language models to explicitly monitor task progress and correct errors. 2. **Experimental results show superiority**: In the ToolBench benchmark test, the performance of Sum2Act is better than existing baseline methods, such as CoT and DFSDT, especially when handling complex real - world tasks. 3. **Expansion of the use of visual APIs**: Sum2Act can also handle more diverse visual tasks by integrating open - world visual APIs. ### Method Overview - **Overall architecture**: Sum2Act utilizes large - language models and a wide range of open - world APIs to solve real - world tasks. It first uses a retriever to obtain tools (or APIs) related to user instructions, and then iterates between the action - proposal stage and the summary stage. - **Action - proposal stage**: The router plans the next action and executes it based on the current state, instructions, and available tools. If the task is not completed, the router will select a specific tool or API and perform the corresponding operation; if the task is completed, it will exit the loop and respond to the user's command. - **Summary stage**: The state manager evaluates the observations of these actions and updates the overall state accordingly. The state manager will check whether the new action successfully returns information related to the target task. If it is successful, it will record the new answer; otherwise, it will record the reason for failure and add it to the failure history. ### Experimental Results - **Evaluation through the ToolBench data set**: The experimental results show that Sum2Act performs excellently when handling complex tasks, especially exceeding existing methods in both the Pass Rate and Win Rate indicators. - **Case studies**: The effectiveness of Sum2Act is demonstrated through specific cases, such as successfully obtaining the version information of C - code compilers, YouTube video information, weather forecasts, and flight data, etc. ### Conclusion Sum2Act significantly improves the ability of large - language models to handle complex real - world tasks by introducing a new tool - calling framework. This method is not only superior to existing methods in performance but also shows strong practicality and flexibility in practical applications.