AgentTuning: Enabling Generalized Agent Abilities for LLMs

Aohan Zeng,Mingdao Liu,Rui Lu,Bowen Wang,Xiao Liu,Yuxiao Dong,Jie Tang

2023-10-23

Abstract:Open large language models (LLMs) with great performance in various tasks have significantly advanced the development of LLMs. However, they are far inferior to commercial models such as ChatGPT and GPT-4 when acting as agents to tackle complex tasks in the real world. These agent tasks employ LLMs as the central controller responsible for planning, memorization, and tool utilization, necessitating both fine-grained prompting methods and robust LLMs to achieve satisfactory performance. Though many prompting methods have been proposed to complete particular agent tasks, there is lack of research focusing on improving the agent capabilities of LLMs themselves without compromising their general abilities. In this work, we present AgentTuning, a simple and general method to enhance the agent abilities of LLMs while maintaining their general LLM capabilities. We construct AgentInstruct, a lightweight instruction-tuning dataset containing high-quality interaction trajectories. We employ a hybrid instruction-tuning strategy by combining AgentInstruct with open-source instructions from general domains. AgentTuning is used to instruction-tune the Llama 2 series, resulting in AgentLM. Our evaluations show that AgentTuning enables LLMs' agent capabilities without compromising general abilities. The AgentLM-70B is comparable to GPT-3.5-turbo on unseen agent tasks, demonstrating generalized agent capabilities. We open source the AgentInstruct and AgentLM-7B, 13B, and 70B models at <a class="link-external link-https" href="https://github.com/THUDM/AgentTuning" rel="external noopener nofollow">this https URL</a>, serving open and powerful alternatives to commercial LLMs for agent tasks.

Computation and Language,Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

This paper focuses on how to enhance the capability of large-scale language models (LLMs) as proxies in handling complex real-world tasks. Current open-source LLMs, such as Llama, perform poorly in these proxy tasks compared to commercial models like GPT-3.5 and GPT-4. The paper proposes a method called AgentTuning, which includes a lightweight instruction fine-tuning dataset called AgentInstruct and a mixed instruction fine-tuning strategy, aimed at strengthening the proxy capability of LLMs while maintaining their general language abilities. AgentInstruct contains high-quality interaction trajectories from six different proxy tasks, with detailed chain-of-thought reasoning for each decision step. This dataset is constructed by collecting interaction trajectories with GPT-4 and filtering them based on reward scores. Then, the researchers adopt a mixed strategy, combining AgentInstruct with instructions from the general domain, to fine-tune LLMs and generate a series of models called AgentLM. Experimental results show that AgentLM performs well in both internal and external proxy tasks, with its 70B version being on par with GPT-3.5 on unseen tasks without sacrificing its performance on traditional NLP tasks. Furthermore, the research also finds that training with only proxy task data leads to a decrease in generalization performance, highlighting the importance of the general capability of LLMs for proxy task generalization. In summary, the paper proposes a new approach to enhance the general proxy capability of LLMs, providing possibilities for open-source LLMs to compete with commercial models while maintaining their wide applicability across various tasks.

AgentTuning: Enabling Generalized Agent Abilities for LLMs

AgentTuning: Enabling Generalized Agent Abilities for LLMs

Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning

CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models

Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models

Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance

AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation

Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning

AgentBench: Evaluating LLMs as Agents

Training Language Model Agents without Modifying Language Models

Dynamics of Instruction Tuning: Each Ability of Large Language Models Has Its Own Growth Pace

AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories

Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs

An Empirical Study of Instruction-tuning Large Language Models in Chinese

PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization

Small LLMs Are Weak Tool Learners: A Multi-LLM Agent

LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement

Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions?

LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction

MIMIR: A Streamlined Platform for Personalized Agent Tuning in Domain Expertise