Multimodal Parameter-Efficient Few-Shot Class Incremental Learning

Marco D'Alessandro,Alberto Alonso,Enrique Calabrés,Mikel Galar
DOI: https://doi.org/10.1109/ICCVW60793.2023.00364
2024-01-08
Abstract:Few-Shot Class Incremental Learning (FSCIL) is a challenging continual learning task, where limited training examples are available during several learning sessions. To succeed in this task, it is necessary to avoid over-fitting new classes caused by biased distributions in the few-shot training sets. The general approach to address this issue involves enhancing the representational capability of a pre-defined backbone architecture by adding special modules for backward compatibility with older classes. However, this approach has not yet solved the dilemma of ensuring high classification accuracy over time while reducing the gap between the performance obtained on larger training sets and the smaller ones. In this work, we propose an alternative approach called Continual Parameter-Efficient CLIP (CPE-CLIP) to reduce the loss of information between different learning sessions. Instead of adapting additional modules to address information loss, we leverage the vast knowledge acquired by CLIP in large-scale pre-training and its effectiveness in generalizing to new concepts. Our approach is multimodal and parameter-efficient, relying on learnable prompts for both the language and vision encoders to enable transfer learning across sessions. We also introduce prompt regularization to improve performance and prevent forgetting. Our experimental results demonstrate that CPE-CLIP significantly improves FSCIL performance compared to state-of-the-art proposals while also drastically reducing the number of learnable parameters and training costs.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the problem of how to effectively learn new classes with limited training samples in the Few-Shot Class Incremental Learning (FSCIL) task, while maintaining the ability to recognize old classes and avoiding catastrophic forgetting. Specifically, the paper focuses on how to reduce information loss between different learning stages, improve the model's classification performance on small sample sets, and reduce the performance gap with large-scale training sets. ### Background and Challenges - **Few-Shot Learning**: In each learning session, only a small number of training samples are available, making the model prone to overfitting to new classes. - **Class Incremental Learning (CIL)**: It requires maintaining the ability to recognize existing classes while continuously introducing new classes, avoiding forgetting old knowledge. - **Limitations of Existing Methods**: Existing methods usually enhance the representation capability of predefined backbone architectures by adding extra modules to achieve backward compatibility with old classes. However, these methods are computationally expensive and fail to effectively reduce the performance gap between small sample sets and large-scale training sets. ### Proposed Method The paper proposes a method called **Continual Parameter-Efficient CLIP (CPE-CLIP)**, aiming to reduce information loss between different learning stages. Specific improvements include: 1. **Multi-modal and Parameter-Efficient Prompt Learning**: - Utilizing the large-scale pre-trained knowledge of the CLIP model, adapting the language and vision encoders through learnable prompts to achieve transfer learning across sessions. - Prompt learning not only reduces the number of parameters that need to be learned but also improves the model's generalization ability. 2. **Prompt Regularization**: - Introducing prompt regularization techniques to enhance performance and prevent forgetting. By adjusting the update rate of prompt parameters, it ensures that the model can better retain old knowledge as learning progresses. ### Experimental Results - **Benchmark Datasets**: The paper conducts experiments on three popular FSCIL benchmark datasets: CIFAR100, miniImageNet, and CUB200-2011. - **Performance Improvement**: Experimental results show that CPE-CLIP significantly outperforms existing methods on all benchmark datasets, especially in reducing forgetting and maintaining high performance. - **Computational Efficiency**: CPE-CLIP significantly reduces the number of learnable parameters and training costs, improving computational efficiency. ### Main Contributions - **Prompt Learning**: Proposes a prompt-based learning method that effectively addresses continual learning tasks in few-shot settings, reduces forgetting, and supports knowledge transfer over time. - **Prompt Regularization**: Combines two different prompt augmentation methods and prompt regularization to smoothly transition to future tasks while maintaining stable performance. - **Performance Advantage**: Achieves state-of-the-art performance on three popular FSCIL benchmark datasets, significantly surpassing previous best results. In summary, the paper effectively addresses key issues in few-shot class incremental learning through the CPE-CLIP method, providing an efficient and high-performance solution for continual learning tasks.