Instruction Tuning With Loss Over Instructions

Zhengyan Shi,Adam X. Yang,Bin Wu,Laurence Aitchison,Emine Yilmaz,Aldo Lipani
2024-10-03
Abstract:Instruction tuning plays a crucial role in shaping the outputs of language models (LMs) to desired styles. In this work, we propose a simple yet effective method, Instruction Modelling (IM), which trains LMs by applying a loss function to the instruction and prompt part rather than solely to the output part. Through experiments across 21 diverse benchmarks, we show that, in many scenarios, IM can effectively improve the LM performance on both NLP tasks (e.g., MMLU, TruthfulQA, and HumanEval) and open-ended generation benchmarks (e.g., MT-Bench and AlpacaEval). Remarkably, in the most advantageous case, IM boosts model performance on AlpacaEval 1.0 by over 100%. We identify two key factors influencing the effectiveness of IM: (1) The ratio between instruction length and output length in the training data; and (2) The number of training examples. We observe that IM is especially beneficial when trained on datasets with lengthy instructions paired with brief outputs, or under the Superficial Alignment Hypothesis (SAH) where a small amount of training examples are used for instruction tuning. Further analysis substantiates our hypothesis that our improvement can be attributed to reduced overfitting to instruction tuning datasets. It is worth noting that we are not proposing \ours as a replacement for current fine-tuning processes. Instead, our work aims to provide practical guidance for instruction tuning LMs, especially in low-resource scenarios.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper attempts to address how to improve the performance of language models (LMs) on natural language processing (NLP) tasks and open generation benchmarks by enhancing the instruction tuning method. Specifically, the authors propose a new method called **Instruction Modelling (IM)**, which significantly improves model performance by applying the loss function not only to the output part but also to the instruction or prompt part during training. ### Main Contributions 1. **Proposing Instruction Modelling (IM)**: Through extensive experiments, IM has been shown to significantly improve the performance of language models on various instruction tuning datasets in many scenarios, especially achieving over 100% performance improvement on the AlpacaEval 1.0 benchmark. 2. **Identifying Key Factors Affecting IM Effectiveness**: These include the ratio of instruction length to output length and the number of training samples. IM is particularly suitable for datasets where the instructions are long and the outputs are short, or for instruction tuning in resource-limited situations. 3. **Explaining the Effective Mechanism of IM**: IM improves model performance on various tasks by reducing overfitting issues. Experimental results show that IM has higher loss during training but lower loss during testing, indicating better generalization ability. ### Experimental Results - **NLP Tasks**: IM performs excellently on multiple NLP tasks, particularly in multilingual understanding and common-sense reasoning. - **Open Generation Benchmarks**: IM also achieves significant performance improvements on open generation benchmarks such as MT-Bench and AlpacaEval. - **Overfitting Issues**: IM improves the model's generalization ability by reducing overfitting to the training data. Experimental results show that IM has higher loss during training but lower loss during testing, indicating that IM can better avoid overfitting. ### Key Findings 1. **Ratio of Instruction Length to Output Length**: IM performs exceptionally well on datasets where the instructions are long and the outputs are short, such as Code Alpaca and Less MMLU Chat. 2. **Number of Training Samples**: IM performs better with fewer training samples, which is particularly important in resource-limited scenarios. 3. **Overfitting Issues**: IM improves the model's generalization ability by reducing overfitting to the training data. Experimental results show that IM has higher loss during training but lower loss during testing. ### Conclusion This paper proposes the Instruction Modelling (IM) method, providing an effective means to improve the performance of language models on instruction tuning tasks, especially in resource-limited situations. IM improves the model's generalization ability by reducing overfitting issues, offering important guidance for future language model research.