Abstract:During the pretraining phase, large language models (LLMs) acquire vast amounts of knowledge from extensive text corpora. Nevertheless, in later stages such as fine-tuning and inference, the model may encounter knowledge not covered in the initial training, which can lead to hallucinations and degraded performance. This issue has a profound impact on the model's capabilities, as it will inevitably face out-of-scope knowledge after pretraining. Furthermore, fine-tuning is often required to adapt LLMs to domain-specific tasks. However, this phenomenon limits the model's ability to learn and integrate new information during fine-tuning. The effectiveness of fine-tuning largely depends on the type of knowledge involved. Existing research suggests that fine-tuning the model on partially mastered knowledge-for instance, question-answer pairs where the model has a chance of providing correct responses under non-greedy decoding-can enable the model to acquire new knowledge while mitigating hallucination. Notably, this approach can still lead to the forgetting of fully mastered knowledge, constraining the fine-tuning dataset to a narrower range and limiting the model's overall potential for improvement. Given the model's intrinsic reasoning abilities and the interconnectedness of different knowledge areas, it is likely that as the model's capacity to utilize existing knowledge improves during fine-tuning, previously unmastered knowledge may become more understandable. To explore this hypothesis, we conducted experiments and, based on the results, proposed a two-stage fine-tuning strategy. This approach not only improves the model's overall test accuracy and knowledge retention but also preserves its accuracy on previously mastered content. When fine-tuning on the WikiQA dataset, our method increases the amount of knowledge acquired by the model in this stage by 24%.

Gradual Syntactic Label Replacement for Language Model Pre-Training

Language Model Pre-training with Linguistically Motivated Curriculum Learning

Improving Language Understanding by Generative Pre-Training

Continual Post-Training of Language Models

APAM: Adaptive Pre-training and Adaptive Meta Learning in Language Model for Noisy Labels and Long-tailed Learning

Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora

Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method

Predictions For Pre-training Language Models

Pre-Trained Language Models and Their Applications

Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models

Instruction Pre-Training: Language Models are Supervised Multitask Learners

Adapting a Language Model While Preserving Its General Knowledge.

Progressively Label Enhancement for Large Language Model Alignment

Class-Incremental Learning based on Label Generation

Taking Notes on the Fly Helps Language Pre-Training

Leveraging Large Language Models for Enhanced NLP Task Performance through Knowledge Distillation and Optimized Training Strategies

Gradient Localization Improves Lifelong Pretraining of Language Models

Noise-Robust Fine-Tuning of Pretrained Language Models via External Guidance

Recent Advances in Pre-trained Language Models: Why Do They Work and How Do They Work

Plug-Tagger: A Pluggable Sequence Labeling Framework Using Language Models

Irreducible Curriculum for Language Model Pretraining