Abstract:Pre-trained models are nowadays a fundamental component of machine learning research. In continual learning, they are commonly used to initialize the model before training on the stream of non-stationary data. However, pre-training is rarely applied during continual learning. We formalize and investigate the characteristics of the continual pre-training scenario in both language and vision environments, where a model is continually pre-trained on a stream of incoming data and only later fine-tuned to different downstream tasks. We show that continually pre-trained models are robust against catastrophic forgetting and we provide strong empirical evidence supporting the fact that self-supervised pre-training is more effective in retaining previous knowledge than supervised protocols. Code is provided at <a class="link-external link-https" href="https://github.com/AndreaCossu/continual-pretraining-nlp-vision" rel="external noopener nofollow">this https URL</a> .

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to mitigate catastrophic forgetting through continual pre - training in the process of continual learning (CL). Specifically, the paper explores that in the contexts of natural language processing (NLP) and computer vision (CV), the model conducts continual pre - training while continuously receiving new data streams, and then fine - tunes for different downstream tasks. By this method, the model can maintain its ability to remember previous knowledge and improve its performance on new tasks. The main contributions of the paper are as follows: 1. **Formalize the Continual Pre - Training Scenario**: For the first time, the paper formally defines the scenario of continual pre - training and describes an evaluation method to measure the impact of continual pre - training on catastrophic forgetting. 2. **Construct Evaluation Environments for NLP and CV**: In order to comprehensively study the effect of continual pre - training, the paper constructs two evaluation environments based on natural language processing and computer vision tasks respectively, and conducts exhaustive research using different datasets, model architectures, and pre - training protocols. 3. **Prove the Effectiveness of Unsupervised / Self - supervised Pre - training**: Research shows that unsupervised or self - supervised pre - training protocols are more effective in mitigating forgetting than supervised protocols. This indicates that continual pre - training can not only help the model adapt to new data, but also better retain the previously learned knowledge. 4. **Analyze the Changes in the Model Feature Space**: Through linear evaluation of the model feature space and centered kernel alignment (CKA) analysis, the paper further verifies that supervised pre - training will lead to greater feature drift, while self - supervised pre - training can better maintain feature consistency. In conclusion, through a series of experiments and analyses, this paper proves that continual pre - training, as an effective strategy, can help the model maintain the memory of old knowledge and effectively learn new knowledge when facing continuous data streams without significantly increasing additional costs. This has important theoretical and practical significance for applications that need to adapt to changing environments for a long time, such as online learning systems, robotics, etc.

Continual Pre-Training Mitigates Forgetting in Language and Vision

Continual Learning with Pretrained Backbones by Tuning in the Input Space

Examining Forgetting in Continual Pre-training of Aligned Large Language Models

Investigating Continual Pretraining in Large Language Models: Insights and Implications

Don't Stop Learning: Towards Continual Learning for the CLIP Model

Improving Representational Continuity via Continued Pretraining

Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning

RanPAC: Random Projections and Pre-trained Models for Continual Learning

Enhancing Visual Continual Learning with Language-Guided Supervision

Preventing Catastrophic Forgetting through Memory Networks in Continuous Detection

Class Incremental Learning with Pre-trained Vision-Language Models

PRETRAINED LANGUAGE MODEL IN CONTINUAL LEARNING: A COMPARATIVE STUDY

Continual Learning of Image Classes with Language Guidance from a Vision-Language Model

Continual Forgetting for Pre-trained Vision Models

Prior-Free Continual Learning with Unlabeled Data in the Wild

Controlling Forgetting with Test-Time Data in Continual Learning

CTP: Towards Vision-Language Continual Pretraining Via Compatible Momentum Contrast and Topology Preservation

A Practitioner's Guide to Continual Multimodal Pretraining

An Empirical Analysis of Forgetting in Pre-trained Models with Incremental Low-Rank Updates

Adaptive Progressive Continual Learning.

Towards a General Framework for Continual Learning with Pre-training