CITI: Enhancing Tool Utilizing Ability in Large Language Models without Sacrificing General Performance

Yupu Hao,Pengfei Cao,Zhuoran Jin,Huanxuan Liao,Yubo Chen,Kang Liu,Jun Zhao
2024-09-23
Abstract:Tool learning enables the Large Language Models (LLMs) to interact with the external environment by invoking tools, enriching the accuracy and capability scope of LLMs. However, previous works predominantly focus on improving model's tool-utilizing accuracy and the ability to generalize to new, unseen tools, excessively forcing LLMs to adjust specific tool-invoking pattern without considering the harm to model's general performance. This deviates from the actual applications and original intention of integrating tools to enhance model. To tackle this problem, we dissect the capability trade-offs by examining the hidden representation changes and the gradient-based importance score of model's components. Based on the analysis result, we propose a Component Importance-based Tool-utilizing ability Injection method (CITI). According to the gradient-based importance score of different components, it alleviates the capability conflicts caused by fine-tuning process by applying distinct training strategies to different components. CITI applies Mixture-Of-LoRA (MOLoRA) for important components. Meanwhile, it fine-tunes the parameters of few components deemed less important in the backbone of the LLM, while keeping other parameters frozen. CITI can effectively enhance the model's tool-utilizing capability without excessively compromising its general performance. Experimental results demonstrate that our approach achieves outstanding performance across a range of evaluation metrics.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to enhance the tool - calling ability in large - language models (LLMs) without sacrificing the general performance of the models. Specifically, existing research mainly focuses on improving the accuracy and generalization ability of models in calling new tools, but these methods often over - adjust the models to adapt to specific tool - calling patterns, resulting in a decline in the performance of the models on other tasks, which is the so - called "catastrophic forgetting". This does not meet the requirements in practical applications, because tools should serve the models, rather than the models being completely adjusted around the tools. To meet this challenge, the authors propose a Component Importance - based Tool - utilizing ability Injection method (CITI). By analyzing the hidden representation changes of the models and the gradient - based importance scores, the CITI method can identify which components are crucial for improving the tool - calling ability while not having too much impact on the general performance of the models. Specific techniques include: 1. **Mixture - of - LoRA (MOLoRA)**: For important components, use the MOLoRA adapter to absorb tool - calling knowledge, and design a routing network to distinguish tool - related and non - tool - related inputs, reducing the impact on the model backbone. 2. **Unimportant Components Optimization (UCO)**: For less important components, use full - parameter fine - tuning to make full use of more parameter resources. 3. **Three - stage training strategy**: - **Router Pre - training (RP)**: Pre - train the routing network to make it learn to distinguish tool - related and non - tool - related inputs. - **MOLoRA Improvement (MI)**: Focus on fine - tuning the MOLoRA adapter while freezing the model backbone. - **Unimportant Components Optimization (UCO)**: Fine - tune a small number of unimportant components in the model backbone to improve the model performance while maintaining its general ability. The experimental results show that the CITI method has achieved excellent performance on two tool - learning datasets (API - Bank and ToolAlpaca). It not only enhances the tool - calling ability but also significantly outperforms other baseline methods in maintaining the general performance of the models. For example, on the API - Bank dataset, the general performance of CITI is 7.59% higher than that of LoRA and 31.95% higher than that of full - parameter fine - tuning.