Abstract:Large Language Models (LLMs) for public use require continuous pre-training to remain up-to-date with the latest data. The models also need to be fine-tuned with specific instructions to maintain their ability to follow instructions accurately. Typically, LLMs are released in two versions: the Base LLM, pre-trained on diverse data, and the instruction-refined LLM, additionally trained with specific instructions for better instruction following. The question arises as to which model should undergo continuous pre-training to maintain its instruction-following abilities while also staying current with the latest data. In this study, we delve into the intricate relationship between continuous pre-training and instruction fine-tuning of the LLMs and investigate the impact of continuous pre-training on the instruction following abilities of both the base and its instruction finetuned model. Further, the instruction fine-tuning process is computationally intense and requires a substantial number of hand-annotated examples for the model to learn effectively. This study aims to find the most compute-efficient strategy to gain up-to-date knowledge and instruction-following capabilities without requiring any instruction data and fine-tuning. We empirically prove our findings on the LLaMa 3, 3.1 and Qwen 2, 2.5 family of base and instruction models, providing a comprehensive exploration of our hypotheses across varying sizes of pre-training data corpus and different LLMs settings.

What problem does this paper attempt to address?

The paper attempts to address the issue of how to balance continual pretraining and instruction fine-tuning in large language models (LLMs) to maintain the model's instruction-following capabilities while ensuring the model can keep up with the latest data. Specifically, the paper explores the following questions: 1. **Impact of continual pretraining on instruction-following capabilities**: What happens to the instruction-following capabilities of a model that has already been instruction fine-tuned when it undergoes continual pretraining? 2. **How to recover lost instruction-following capabilities**: If instruction-following capabilities are lost during continual pretraining, how can these capabilities be effectively restored? 3. **The necessity of resource-intensive instruction fine-tuning after updating the base model's knowledge**: After updating the base model's knowledge, is additional instruction fine-tuning necessary to recover or enhance instruction-following capabilities? The paper investigates two different setups through experiments: - **Setup 1**: Starting directly from an instruction fine-tuned LLM, using a new dataset for continual pretraining. - **Setup 2**: First performing continual pretraining on the base model, then fine-tuning with the instruction dataset. The main findings of the study include: - Continual pretraining leads to a significant decline in the instruction-following capabilities of instruction fine-tuned models, thus continual pretraining on instruction fine-tuned models should be avoided. - Performing continual pretraining on the base model followed by instruction fine-tuning can retain both domain knowledge and instruction-following capabilities. - Instruction-following capabilities are transferable between models with the same ancestor and can be restored through simple parameter addition and subtraction operations. - After continual pretraining on the base model, traditional instruction fine-tuning is not necessary; instead, instruction-following capabilities can be restored through capability transfer. These findings provide important insights into how to efficiently maintain and enhance the instruction-following capabilities of large language models while keeping their knowledge up to date.

Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs

Maybe Only 0.5 Training Data Instruction Tuning

Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model

Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models

Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs

Non-instructional Fine-tuning: Enabling Instruction-Following Capabilities in Pre-trained Language Models without Instruction-Following Data

From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning

Evaluating Large Language Models at Evaluating Instruction Following

Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

Continual Instruction Tuning for Large Multimodal Models

Instruction Pre-Training: Language Models are Supervised Multitask Learners

InsCL: A Data-efficient Continual Learning Paradigm for Fine-tuning Large Language Models with Instructions

Instruction Mining: Instruction Data Selection for Tuning Large Language Models

Demystifying Instruction Mixing for Fine-tuning Large Language Models

LLaCA: Multimodal Large Language Continual Assistant

Efficient Continual Pre-training by Mitigating the Stability Gap

Continual LLaVA: Continual Instruction Tuning in Large Vision-Language Models

The Construction of Instruction-tuned LLMs for Finance without Instruction Data Using Continual Pretraining and Model Merging

FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models

Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance