Abstract:This paper studies the evolving domain of Continual Learning (CL) in large language models (LLMs), with a focus on developing strategies for efficient and sustainable training. Our primary emphasis is on continual domain-adaptive pretraining, a process designed to equip LLMs with the ability to integrate new information from various domains while retaining previously learned knowledge and enhancing cross-domain knowledge transfer without relying on domain-specific identification. Unlike previous studies, which mostly concentrate on a limited selection of tasks or domains and primarily aim to address the issue of forgetting, our research evaluates the adaptability and capabilities of LLMs to changing data landscapes in practical scenarios. To this end, we introduce a new benchmark designed to measure the adaptability of LLMs to these evolving data environments, offering a comprehensive framework for evaluation. We examine the impact of model size on learning efficacy and forgetting, as well as how the progression and similarity of emerging domains affect the knowledge transfer within these models. Our findings uncover several key insights: (i) when the sequence of domains shows semantic similarity, continual pretraining enables LLMs to better specialize in the current domain compared to stand-alone fine-tuning, (ii) training across a diverse range of domains enhances both backward and forward knowledge transfer, and (iii) smaller models are particularly sensitive to continual pretraining, showing the most significant rates of both forgetting and learning. We posit that our research marks a shift towards establishing a more realistic benchmark for investigating CL in LLMs, and has the potential to play a key role in guiding the direction of future research in the field.

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains.

On the Domain Adaptation and Generalization of Pretrained Language Models: A Survey

MDAPT: Multilingual Domain Adaptive Pretraining in a Single Model

Multi-Stage Pre-training for Low-Resource Domain Adaptation

Adapting a Language Model While Preserving Its General Knowledge.

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

A Compact Pretraining Approach for Neural Language Models

Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora

Adapting Large Language Models to Domains via Reading Comprehension

Exploring the Benefits of Domain-Pretraining of Generative Large Language Models for Chemistry

Domain-oriented Language Pre-training with Adaptive Hybrid Masking and Optimal Transport Alignment

Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI

Continual Post-Training of Language Models

Investigating Continual Pretraining in Large Language Models: Insights and Implications

Domain-Specific Pretraining of Language Models: A Comparative Study in the Medical Field

Instruction Pre-Training: Language Models are Supervised Multitask Learners

Can Demographic Factors Improve Text Classification? Revisiting Demographic Adaptation in the Age of Transformers

Adaptation Odyssey in LLMs: Why Does Additional Pretraining Sometimes Fail to Improve?

Back-Translated Task Adaptive Pretraining: Improving Accuracy and Robustness on Text Classification