Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs

Ishan Jindal,Chandana Badrinath,Pranjal Bharti,Lakkidi Vinay,Sachin Dev Sharma
2024-10-15
Abstract:Large Language Models (LLMs) for public use require continuous pre-training to remain up-to-date with the latest data. The models also need to be fine-tuned with specific instructions to maintain their ability to follow instructions accurately. Typically, LLMs are released in two versions: the Base LLM, pre-trained on diverse data, and the instruction-refined LLM, additionally trained with specific instructions for better instruction following. The question arises as to which model should undergo continuous pre-training to maintain its instruction-following abilities while also staying current with the latest data. In this study, we delve into the intricate relationship between continuous pre-training and instruction fine-tuning of the LLMs and investigate the impact of continuous pre-training on the instruction following abilities of both the base and its instruction finetuned model. Further, the instruction fine-tuning process is computationally intense and requires a substantial number of hand-annotated examples for the model to learn effectively. This study aims to find the most compute-efficient strategy to gain up-to-date knowledge and instruction-following capabilities without requiring any instruction data and fine-tuning. We empirically prove our findings on the LLaMa 3, 3.1 and Qwen 2, 2.5 family of base and instruction models, providing a comprehensive exploration of our hypotheses across varying sizes of pre-training data corpus and different LLMs settings.
Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the issue of how to balance continual pretraining and instruction fine-tuning in large language models (LLMs) to maintain the model's instruction-following capabilities while ensuring the model can keep up with the latest data. Specifically, the paper explores the following questions: 1. **Impact of continual pretraining on instruction-following capabilities**: What happens to the instruction-following capabilities of a model that has already been instruction fine-tuned when it undergoes continual pretraining? 2. **How to recover lost instruction-following capabilities**: If instruction-following capabilities are lost during continual pretraining, how can these capabilities be effectively restored? 3. **The necessity of resource-intensive instruction fine-tuning after updating the base model's knowledge**: After updating the base model's knowledge, is additional instruction fine-tuning necessary to recover or enhance instruction-following capabilities? The paper investigates two different setups through experiments: - **Setup 1**: Starting directly from an instruction fine-tuned LLM, using a new dataset for continual pretraining. - **Setup 2**: First performing continual pretraining on the base model, then fine-tuning with the instruction dataset. The main findings of the study include: - Continual pretraining leads to a significant decline in the instruction-following capabilities of instruction fine-tuned models, thus continual pretraining on instruction fine-tuned models should be avoided. - Performing continual pretraining on the base model followed by instruction fine-tuning can retain both domain knowledge and instruction-following capabilities. - Instruction-following capabilities are transferable between models with the same ancestor and can be restored through simple parameter addition and subtraction operations. - After continual pretraining on the base model, traditional instruction fine-tuning is not necessary; instead, instruction-following capabilities can be restored through capability transfer. These findings provide important insights into how to efficiently maintain and enhance the instruction-following capabilities of large language models while keeping their knowledge up to date.