Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction

Tingchen Fu,Deng Cai,Lemao Liu,Shuming Shi,Rui Yan
2024-05-22
Abstract:Supervised fine-tuning (SFT) on instruction-following corpus is a crucial approach toward the alignment of large language models (LLMs). However, the performance of LLMs on standard knowledge and reasoning benchmarks tends to suffer from deterioration at the latter stage of the SFT process, echoing the phenomenon of alignment tax. Through our pilot study, we put a hypothesis that the data biases are probably one cause behind the phenomenon. To address the issue, we introduce a simple disperse-then-merge framework. To be concrete, we disperse the instruction-following data into portions and train multiple sub-models using different data portions. Then we merge multiple models into a single one via model merging techniques. Despite its simplicity, our framework outperforms various sophisticated methods such as data curation and training regularization on a series of standard knowledge and reasoning benchmarks.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the issue that during the supervised fine-tuning (SFT) of large language models (LLMs), the performance of the models on standard knowledge and reasoning benchmarks does not continuously improve with the increase in instruction-following data. Instead, it shows a decline, a phenomenon referred to as "alignment tax." Through preliminary research, the authors found that data bias might be one of the main causes of this phenomenon. To tackle this problem, the paper proposes a simple method called the Disperse-Then-Merge (DTM) framework. This method involves splitting the instruction-following data into multiple parts, training sub-models separately, and then merging these sub-models into a single model to reduce the impact of data bias. Experimental results show that this method can effectively improve the model's performance on multiple benchmarks, outperforming other complex data filtering or regularization methods.