Abstract:Tool learning enables the Large Language Models (LLMs) to interact with the external environment by invoking tools, enriching the accuracy and capability scope of LLMs. However, previous works predominantly focus on improving model's tool-utilizing accuracy and the ability to generalize to new, unseen tools, excessively forcing LLMs to adjust specific tool-invoking pattern without considering the harm to model's general performance. This deviates from the actual applications and original intention of integrating tools to enhance model. To tackle this problem, we dissect the capability trade-offs by examining the hidden representation changes and the gradient-based importance score of model's components. Based on the analysis result, we propose a Component Importance-based Tool-utilizing ability Injection method (CITI). According to the gradient-based importance score of different components, it alleviates the capability conflicts caused by fine-tuning process by applying distinct training strategies to different components. CITI applies Mixture-Of-LoRA (MOLoRA) for important components. Meanwhile, it fine-tunes the parameters of few components deemed less important in the backbone of the LLM, while keeping other parameters frozen. CITI can effectively enhance the model's tool-utilizing capability without excessively compromising its general performance. Experimental results demonstrate that our approach achieves outstanding performance across a range of evaluation metrics.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to enhance the tool - calling ability in large - language models (LLMs) without sacrificing the general performance of the models. Specifically, existing research mainly focuses on improving the accuracy and generalization ability of models in calling new tools, but these methods often over - adjust the models to adapt to specific tool - calling patterns, resulting in a decline in the performance of the models on other tasks, which is the so - called "catastrophic forgetting". This does not meet the requirements in practical applications, because tools should serve the models, rather than the models being completely adjusted around the tools. To meet this challenge, the authors propose a Component Importance - based Tool - utilizing ability Injection method (CITI). By analyzing the hidden representation changes of the models and the gradient - based importance scores, the CITI method can identify which components are crucial for improving the tool - calling ability while not having too much impact on the general performance of the models. Specific techniques include: 1. **Mixture - of - LoRA (MOLoRA)**: For important components, use the MOLoRA adapter to absorb tool - calling knowledge, and design a routing network to distinguish tool - related and non - tool - related inputs, reducing the impact on the model backbone. 2. **Unimportant Components Optimization (UCO)**: For less important components, use full - parameter fine - tuning to make full use of more parameter resources. 3. **Three - stage training strategy**: - **Router Pre - training (RP)**: Pre - train the routing network to make it learn to distinguish tool - related and non - tool - related inputs. - **MOLoRA Improvement (MI)**: Focus on fine - tuning the MOLoRA adapter while freezing the model backbone. - **Unimportant Components Optimization (UCO)**: Fine - tune a small number of unimportant components in the model backbone to improve the model performance while maintaining its general ability. The experimental results show that the CITI method has achieved excellent performance on two tool - learning datasets (API - Bank and ToolAlpaca). It not only enhances the tool - calling ability but also significantly outperforms other baseline methods in maintaining the general performance of the models. For example, on the API - Bank dataset, the general performance of CITI is 7.59% higher than that of LoRA and 31.95% higher than that of full - parameter fine - tuning.

CITI: Enhancing Tool Utilizing Ability in Large Language Models without Sacrificing General Performance

Large Language Models as Tool Makers

CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets

Cross-model Control: Improving Multiple Large Language Models in One-time Training

LM-Cocktail: Resilient Tuning of Language Models via Model Merging

Enhancing Tool Retrieval with Iterative Feedback from Large Language Models

Making Language Models Better Tool Learners with Execution Feedback

Small LLMs Are Weak Tool Learners: A Multi-LLM Agent

T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step

CITING: Large Language Models Create Curriculum for Instruction Tuning

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

TAIA: Large Language Models are Out-of-Distribution Data Learners

MetaTool: Facilitating Large Language Models to Master Tools with Meta-task Augmentation

Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees

Towards Practical Tool Usage for Continually Learning LLMs

Parameter-efficient Tuning for Large Language Model Without Calculating Its Gradients

GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction

Chain of Tools: Large Language Model is an Automatic Multi-tool Learner

Enhancing Large Language Model Performance To Answer Questions and Extract Information More Accurately

TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems