Learning Prompt with Distribution-Based Feature Replay for Few-Shot Class-Incremental Learning

Zitong Huang,Ze Chen,Zhixing Chen,Erjin Zhou,Xinxing Xu,Rick Siow Mong Goh,Yong Liu,Wangmeng Zuo,Chunmei Feng
2024-04-05
Abstract:Few-shot Class-Incremental Learning (FSCIL) aims to continuously learn new classes based on very limited training data without forgetting the old ones encountered. Existing studies solely relied on pure visual networks, while in this paper we solved FSCIL by leveraging the Vision-Language model (e.g., CLIP) and propose a simple yet effective framework, named Learning Prompt with Distribution-based Feature Replay (LP-DiF). We observe that simply using CLIP for zero-shot evaluation can substantially outperform the most influential methods. Then, prompt tuning technique is involved to further improve its adaptation ability, allowing the model to continually capture specific knowledge from each session. To prevent the learnable prompt from forgetting old knowledge in the new session, we propose a pseudo-feature replay approach. Specifically, we preserve the old knowledge of each class by maintaining a feature-level Gaussian distribution with a diagonal covariance matrix, which is estimated by the image features of training images and synthesized features generated from a VAE. When progressing to a new session, pseudo-features are sampled from old-class distributions combined with training images of the current session to optimize the prompt, thus enabling the model to learn new knowledge while retaining old knowledge. Experiments on three prevalent benchmarks, i.e., CIFAR100, mini-ImageNet, CUB-200, and two more challenging benchmarks, i.e., SUN-397 and CUB-200$^*$ proposed in this paper showcase the superiority of LP-DiF, achieving new state-of-the-art (SOTA) in FSCIL. Code is publicly available at
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is in Few - Shot Class - Incremental Learning (FSCIL), how to make the model learn new class knowledge while retaining the learned old class knowledge and avoiding catastrophic forgetting when there are only a small number of training samples for new classes. Specifically, the paper proposes a method based on a pre - trained Vision - Language model (V - L model), by learning lightweight prompts to adapt to new tasks, and combining the pseudo - feature replay technique to prevent the forgetting of old knowledge. ### Main contributions of the paper: 1. **Utilizing pre - trained Vision - Language models**: The paper shows that pre - trained V - L models (such as CLIP) are very beneficial for FSCIL because of their strong generalization ability. Based on this finding, the paper proposes a simple and effective FSCIL framework called LP - DiF (Learning Prompt with Distribution - based Feature Replay). 2. **Prompt Tuning**: Through prompt tuning, the model can continuously capture specific knowledge in each session. The paper proposes a feature replay technique. By constructing the feature - level Gaussian distribution for each class, it combines the pseudo - feature replay with the training images of the current session, thus retaining old knowledge while learning new knowledge. 3. **Extensive experimental verification**: The paper conducts extensive experiments and comparisons on three popular FSCIL benchmark datasets (CIFAR - 100, CUB - 200, and mini - ImageNet) and two more challenging benchmark datasets (SUN - 397 and CUB - 200*), demonstrating the superiority of the method. ### Method overview: 1. **Problem definition**: The goal of FSCIL is to learn new class knowledge in consecutive sessions while preventing the model from forgetting old class knowledge. Each session contains a limited number of new class samples, organized in the N - Way K - shot format. 2. **Prompt learning**: In each session, the prompt is optimized by minimizing the loss function. For the first session, the prompt is randomly initialized; for subsequent sessions, it is initialized with the prompt trained in the previous session, and combined with the data of the current session and the pseudo - features of the old classes to optimize the prompt. 3. **Pseudo - feature replay**: To prevent catastrophic forgetting, the paper proposes a pseudo - feature replay technique. Specifically, by estimating the feature - level Gaussian distribution for each class, sampling pseudo - features from these distributions, and combining them with the real features of the current session to jointly optimize the prompt. ### Experimental results: The experimental results of the paper on multiple benchmark datasets show that the LP - DiF method significantly outperforms existing FSCIL methods, especially in terms of the Performance Degradation (PD). For example, on the mini - ImageNet dataset, the average accuracy of the LP - DiF method reaches 93.76%, only 1.05% lower than the upper limit of joint training (Joint - LP). In conclusion, by combining the pre - trained Vision - Language model and the pseudo - feature replay technique, this paper proposes an effective method to solve the FSCIL problem, demonstrating the ability to continuously learn and prevent forgetting in the few - shot situation.