SRIL: Selective Regularization for Class-Incremental Learning

Jisu Han,Jaemin Na,Wonjun Hwang
DOI: https://doi.org/10.48550/arXiv.2305.05175
2023-05-09
Abstract:Human intelligence gradually accepts new information and accumulates knowledge throughout the lifespan. However, deep learning models suffer from a catastrophic forgetting phenomenon, where they forget previous knowledge when acquiring new information. Class-Incremental Learning aims to create an integrated model that balances plasticity and stability to overcome this challenge. In this paper, we propose a selective regularization method that accepts new knowledge while maintaining previous knowledge. We first introduce an asymmetric feature distillation method for old and new classes inspired by cognitive science, using the gradient of classification and knowledge distillation losses to determine whether to perform pattern completion or pattern separation. We also propose a method to selectively interpolate the weight of the previous model for a balance between stability and plasticity, and we adjust whether to transfer through model confidence to ensure the performance of the previous class and enable exploratory learning. We validate the effectiveness of the proposed method, which surpasses the performance of existing methods through extensive experimental protocols using CIFAR-100, ImageNet-Subset, and ImageNet-Full.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to overcome catastrophic forgetting in class - incremental learning (CIL), that is, to maintain the memory of old knowledge while learning new knowledge. Specifically, the paper proposes a selective regularization method for class - incremental learning (SRIL), aiming to balance the stability and plasticity of the model by introducing gradient - based feature distillation (GFD) and confidence - aware weight interpolation (CWI). ### Main Contributions 1. **Gradient - based Feature Distillation (GFD)** - A gradient - based feature distillation method is proposed. By calculating the cosine similarity between the gradients of the classification loss and the knowledge distillation loss, a binary mask is generated to decide whether to apply feature distillation to each channel. - For new - class data, when the gradient directions of the classification loss and the knowledge distillation loss are the same, it represents pattern completion; otherwise, it represents pattern separation. 2. **Confidence - aware Weight Interpolation (CWI)** - By interpolating the weights of the old model into the new model, the change in the loss of the old data is prevented, ensuring the stability of the model. - The interpolation parameter is dynamically adjusted according to the confidence of the new model on the old data. If the confidence of the new model on the old data exceeds a certain threshold, the regularization is removed to achieve exploratory learning, thus balancing stability and plasticity. ### Experimental Verification - **Data Sets**: Three data sets, CIFAR - 100, ImageNet - Subset and ImageNet - Full, are used for the experiments. - **Performance Comparison**: Compared with the existing state - of - the - art methods (such as iCaRL, BiC, LUCIR, PODNet, etc.), the results show that SRIL achieves better performance in multiple task settings. - **Ablation Study**: The effects of each component (GFD and CWI) are analyzed through ablation experiments, verifying the effectiveness of these components. ### Conclusion By proposing the SRIL method, this paper effectively solves the catastrophic forgetting problem in class - incremental learning and shows superior performance on multiple data sets. Through gradient - based feature distillation and confidence - aware weight interpolation, SRIL can maintain the memory of old knowledge while learning new knowledge, thus achieving a good balance between the stability and plasticity of the model.