Abstract:Instruction tuning plays a crucial role in shaping the outputs of language models (LMs) to desired styles. In this work, we propose a simple yet effective method, Instruction Modelling (IM), which trains LMs by applying a loss function to the instruction and prompt part rather than solely to the output part. Through experiments across 21 diverse benchmarks, we show that, in many scenarios, IM can effectively improve the LM performance on both NLP tasks (e.g., MMLU, TruthfulQA, and HumanEval) and open-ended generation benchmarks (e.g., MT-Bench and AlpacaEval). Remarkably, in the most advantageous case, IM boosts model performance on AlpacaEval 1.0 by over 100%. We identify two key factors influencing the effectiveness of IM: (1) The ratio between instruction length and output length in the training data; and (2) The number of training examples. We observe that IM is especially beneficial when trained on datasets with lengthy instructions paired with brief outputs, or under the Superficial Alignment Hypothesis (SAH) where a small amount of training examples are used for instruction tuning. Further analysis substantiates our hypothesis that our improvement can be attributed to reduced overfitting to instruction tuning datasets. It is worth noting that we are not proposing \ours as a replacement for current fine-tuning processes. Instead, our work aims to provide practical guidance for instruction tuning LMs, especially in low-resource scenarios.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper attempts to address how to improve the performance of language models (LMs) on natural language processing (NLP) tasks and open generation benchmarks by enhancing the instruction tuning method. Specifically, the authors propose a new method called **Instruction Modelling (IM)**, which significantly improves model performance by applying the loss function not only to the output part but also to the instruction or prompt part during training. ### Main Contributions 1. **Proposing Instruction Modelling (IM)**: Through extensive experiments, IM has been shown to significantly improve the performance of language models on various instruction tuning datasets in many scenarios, especially achieving over 100% performance improvement on the AlpacaEval 1.0 benchmark. 2. **Identifying Key Factors Affecting IM Effectiveness**: These include the ratio of instruction length to output length and the number of training samples. IM is particularly suitable for datasets where the instructions are long and the outputs are short, or for instruction tuning in resource-limited situations. 3. **Explaining the Effective Mechanism of IM**: IM improves model performance on various tasks by reducing overfitting issues. Experimental results show that IM has higher loss during training but lower loss during testing, indicating better generalization ability. ### Experimental Results - **NLP Tasks**: IM performs excellently on multiple NLP tasks, particularly in multilingual understanding and common-sense reasoning. - **Open Generation Benchmarks**: IM also achieves significant performance improvements on open generation benchmarks such as MT-Bench and AlpacaEval. - **Overfitting Issues**: IM improves the model's generalization ability by reducing overfitting to the training data. Experimental results show that IM has higher loss during training but lower loss during testing, indicating that IM can better avoid overfitting. ### Key Findings 1. **Ratio of Instruction Length to Output Length**: IM performs exceptionally well on datasets where the instructions are long and the outputs are short, such as Code Alpaca and Less MMLU Chat. 2. **Number of Training Samples**: IM performs better with fewer training samples, which is particularly important in resource-limited scenarios. 3. **Overfitting Issues**: IM improves the model's generalization ability by reducing overfitting to the training data. Experimental results show that IM has higher loss during training but lower loss during testing. ### Conclusion This paper proposes the Instruction Modelling (IM) method, providing an effective means to improve the performance of language models on instruction tuning tasks, especially in resource-limited situations. IM improves the model's generalization ability by reducing overfitting issues, offering important guidance for future language model research.

Instruction Tuning With Loss Over Instructions

Maybe Only 0.5 Training Data Instruction Tuning

Instruction Tuning for Large Language Models: A Survey

Demystifying Instruction Mixing for Fine-tuning Large Language Models

LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms

Contrastive Instruction Tuning

From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning

Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search

Evaluating the Zero-shot Robustness of Instruction-tuned Language Models

Instruction Following without Instruction Tuning

Exploring Format Consistency for Instruction Tuning

Does Instruction Tuning Make LLMs More Consistent?

CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model

Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation

Achieving Stronger Generation Via Simple Contrastive Tuning

Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance

CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models

Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning

Towards Robust Instruction Tuning on Multimodal Large Language Models

Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs

Dynamics of Instruction Tuning: Each Ability of Large Language Models Has Its Own Growth Pace