AgentTuning: Enabling Generalized Agent Abilities for LLMs

Aohan Zeng,Mingdao Liu,Rui Lu,Bowen Wang,Xiao Liu,Yuxiao Dong,Jie Tang
2023-10-23
Abstract:Open large language models (LLMs) with great performance in various tasks have significantly advanced the development of LLMs. However, they are far inferior to commercial models such as ChatGPT and GPT-4 when acting as agents to tackle complex tasks in the real world. These agent tasks employ LLMs as the central controller responsible for planning, memorization, and tool utilization, necessitating both fine-grained prompting methods and robust LLMs to achieve satisfactory performance. Though many prompting methods have been proposed to complete particular agent tasks, there is lack of research focusing on improving the agent capabilities of LLMs themselves without compromising their general abilities. In this work, we present AgentTuning, a simple and general method to enhance the agent abilities of LLMs while maintaining their general LLM capabilities. We construct AgentInstruct, a lightweight instruction-tuning dataset containing high-quality interaction trajectories. We employ a hybrid instruction-tuning strategy by combining AgentInstruct with open-source instructions from general domains. AgentTuning is used to instruction-tune the Llama 2 series, resulting in AgentLM. Our evaluations show that AgentTuning enables LLMs' agent capabilities without compromising general abilities. The AgentLM-70B is comparable to GPT-3.5-turbo on unseen agent tasks, demonstrating generalized agent capabilities. We open source the AgentInstruct and AgentLM-7B, 13B, and 70B models at <a class="link-external link-https" href="https://github.com/THUDM/AgentTuning" rel="external noopener nofollow">this https URL</a>, serving open and powerful alternatives to commercial LLMs for agent tasks.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
This paper focuses on how to enhance the capability of large-scale language models (LLMs) as proxies in handling complex real-world tasks. Current open-source LLMs, such as Llama, perform poorly in these proxy tasks compared to commercial models like GPT-3.5 and GPT-4. The paper proposes a method called AgentTuning, which includes a lightweight instruction fine-tuning dataset called AgentInstruct and a mixed instruction fine-tuning strategy, aimed at strengthening the proxy capability of LLMs while maintaining their general language abilities. AgentInstruct contains high-quality interaction trajectories from six different proxy tasks, with detailed chain-of-thought reasoning for each decision step. This dataset is constructed by collecting interaction trajectories with GPT-4 and filtering them based on reward scores. Then, the researchers adopt a mixed strategy, combining AgentInstruct with instructions from the general domain, to fine-tune LLMs and generate a series of models called AgentLM. Experimental results show that AgentLM performs well in both internal and external proxy tasks, with its 70B version being on par with GPT-3.5 on unseen tasks without sacrificing its performance on traditional NLP tasks. Furthermore, the research also finds that training with only proxy task data leads to a decrease in generalization performance, highlighting the importance of the general capability of LLMs for proxy task generalization. In summary, the paper proposes a new approach to enhance the general proxy capability of LLMs, providing possibilities for open-source LLMs to compete with commercial models while maintaining their wide applicability across various tasks.