Abstract:Modern techniques of pre-training and fine-tuning have significantly improved the performance of models on downstream tasks. However, this improvement faces challenges when pre-trained models encounter the necessity to adapt sequentially to multiple downstream tasks within the context of continuously shifting training data. In this study, we aim to leverage the general capabilities of pre-trained models for knowledge sharing across different tasks while endow them with the capability for continuous learning. To this end, we propose a Hypernetwork-based Parameter Efficient Fine-Tuning (HyperPEFT) framework. Utilizing a pre-trained Vision Transformer (ViT) as the backbone, HyperPEFT is capable of incorporating various PEFT techniques, enabling the pre-trained ViT to adapt to diverse downstream tasks. The core of our method lies in the application of hypernetworks, which efficiently encapsulate task-specific information, significantly reducing task interference and fortifying the model against catastrophic forgetting. The adoption PEFT techniques allows for precise adjustments to the pre-trained models, enhancing their performance for each specific task. Moreover, this strategy employs a shared hypernetwork to make task-specific adjustments, thereby facilitating knowledge sharing across different tasks for pre-trained models. The extensive experiments reveal that our method effectively mitigates catastrophic forgetting, outperforms comparison methods, and uncovers latent associations among tasks. Overall, this study introduces a unified strategy that synergistically blends the general capabilities of pre-trained models with the necessary adaptability for continual learning scenarios.

Continual Learning with Pretrained Backbones by Tuning in the Input Space

Continual Pre-Training Mitigates Forgetting in Language and Vision

Preventing Catastrophic Forgetting through Memory Networks in Continuous Detection

Improving Representational Continuity via Continued Pretraining

Adaptive Progressive Continual Learning.

Continual Learning by Modeling Intra-Class Variation

Two Complementary Perspectives to Continual Learning: Ask Not Only What to Optimize, But Also How

Recyclable Tuning for Continual Pre-training

Towards a General Framework for Continual Learning with Pre-training

Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning

Reinforced Continual Learning

Auxiliary Classifiers Improve Stability and Efficiency in Continual Learning

Sparse Orthogonal Parameters Tuning for Continual Learning

Simpler is Better: off-the-shelf Continual Learning Through Pretrained Backbones

Sparsity and Heterogeneous Dropout for Continual Learning in the Null Space of Neural Activations

Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora

Alleviating Representational Shift for Continual Fine-tuning.

Bridging Pre-Trained Models to Continual Learning: A Hypernetwork Based Framework with Parameter-Efficient Fine-Tuning Techniques

Contrastive-Adversarial and Diffusion: Exploring pre-training and fine-tuning strategies for sulcal identification

Maintaining Plasticity in Deep Continual Learning