CommonIT: Commonality-Aware Instruction Tuning for Large Language Models via Data Partitions

Jun Rao,Xuebo Liu,Lian Lian,Shengjun Cheng,Yunjie Liao,Min Zhang
2024-10-04
Abstract:With instruction tuning, Large Language Models (LLMs) can enhance their ability to adhere to commands. Diverging from most works focusing on data mixing, our study concentrates on enhancing the model's capabilities from the perspective of data sampling during training. Drawing inspiration from the human learning process, where it is generally easier to master solutions to similar topics through focused practice on a single type of topic, we introduce a novel instruction tuning strategy termed CommonIT: Commonality-aware Instruction Tuning. Specifically, we cluster instruction datasets into distinct groups with three proposed metrics (Task, Embedding and Length). We ensure each training mini-batch, or "partition", consists solely of data from a single group, which brings about both data randomness across mini-batches and intra-batch data similarity. Rigorous testing on LLaMa models demonstrates CommonIT's effectiveness in enhancing the instruction-following capabilities of LLMs through IT datasets (FLAN, CoT, and Alpaca) and models (LLaMa2-7B, Qwen2-7B, LLaMa 13B, and BLOOM 7B). CommonIT consistently boosts an average improvement of 2.1\% on the general domain (i.e., the average score of Knowledge, Reasoning, Multilinguality and Coding) with the Length metric, and 5.2\% on the special domain (i.e., GSM, Openfunctions and Code) with the Task metric, and 3.8\% on the specific tasks (i.e., MMLU) with the Embedding metric. Code is available at \url{<a class="link-external link-https" href="https://github.com/raojay7/CommonIT" rel="external noopener nofollow">this https URL</a>}.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problems of inaccurate instruction understanding and decreased task - execution ability in large - language models (LLMs) during instruction - fine - tuning due to data mixing. Specifically: 1. **Inaccurate instruction understanding**: Most existing methods perform instruction - fine - tuning by mixing data from different tasks, which may lead to the model having difficulty in accurately understanding the instructions of specific tasks, thus affecting its execution effect (as shown in Figure 1). For example, the model may not be able to correctly identify the instructions of a translation task and simply reply with the final phrase. 2. **Decreased task - execution ability**: Since multiple tasks or diverse instructions are mixed together, the model may have a deviation in the understanding of specific tasks, which in turn leads to a decline in overall performance. This phenomenon is especially more obvious in a multi - task environment, where the model performs well on some tasks but poorly on others. To solve these problems, the author proposes a new method named **CommonIT**, that is, an instruction - fine - tuning strategy based on data commonality. This method improves the model's ability to understand instructions and the accuracy of task execution by dividing the instruction data set into different groups and ensuring that each trained mini - batch contains only data from a single group. ### Main contributions of CommonIT 1. **Proposing the CommonIT framework**: This framework uses data commonality to enhance the model's instruction - following ability. It includes three grouping strategies (task, embedding, length) and introduces a batch - based constraint strategy in the optimization process (§3). 2. **Wide applicability**: CommonIT shows wide applicability in multiple dimensions, including different data sets, general and professional fields, and various models. In addition, the most appropriate grouping strategies in different scenarios are also explored (§5.1). 3. **Analysis of improvement sources**: Through the exploration of commonality, possible explanations for the sources of improvement are provided (§5.2 and §5.3). Through these improvements, CommonIT significantly improves the model's performance on multiple tasks, achieving an average improvement of 2.1%, 5.2% and 3.8% in the general field and specific tasks respectively.