DELTA: Decoupling Long-Tailed Online Continual Learning

Siddeshwar Raghavan,Jiangpeng He,Fengqing Zhu
2024-04-06
Abstract:A significant challenge in achieving ubiquitous Artificial Intelligence is the limited ability of models to rapidly learn new information in real-world scenarios where data follows long-tailed distributions, all while avoiding forgetting previously acquired knowledge. In this work, we study the under-explored problem of Long-Tailed Online Continual Learning (LTOCL), which aims to learn new tasks from sequentially arriving class-imbalanced data streams. Each data is observed only once for training without knowing the task data distribution. We present DELTA, a decoupled learning approach designed to enhance learning representations and address the substantial imbalance in LTOCL. We enhance the learning process by adapting supervised contrastive learning to attract similar samples and repel dissimilar (out-of-class) samples. Further, by balancing gradients during training using an equalization loss, DELTA significantly enhances learning outcomes and successfully mitigates catastrophic forgetting. Through extensive evaluation, we demonstrate that DELTA improves the capacity for incremental learning, surpassing existing OCL methods. Our results suggest considerable promise for applying OCL in real-world applications.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively handle long - tailed distribution data (Long - Tailed Distribution) in the online continual learning (OCL) scenario. Specifically, the paper focuses on how to enable the model to quickly learn new information while avoiding forgetting previously learned knowledge in the case where the number of samples of classes is extremely unbalanced in new tasks emerging in the data stream. This setting is very common in the real world, such as in fields like animal species identification, medical image diagnosis, and food image classification, where data usually has a serious class imbalance problem. ### Main Problems 1. **Online Continual Learning under Long - Tailed Distribution**: - Online continual learning requires the model to be trained with only one observation of the data each time and without knowing the data distribution of the task. - Long - tailed distribution means that the number of samples in some classes is far more than that in other classes, which causes the model to be easily over - fitted to the majority classes and under - fitted to the minority classes. 2. **Catastrophic Forgetting**: - When the model is learning a new task, it is easy to forget the previously learned knowledge, which is a major challenge in continual learning. ### Solutions To address the above problems, the paper proposes the DELTA (Decoupling Long - Tailed Online Continual Learning) framework, which consists of two main stages: 1. **Representation Learning Stage**: - Use supervised contrastive learning to attract similar samples and repel samples of different classes, thereby enhancing the feature representation. - The specific contrastive loss function is: \[ L_{\text{contrastive}}(Z_T)=\sum_{j\in T - 1}\frac{\sum_{p\in P(j)}\exp(v_j\cdot v_p / \tau)}{\sum_{k\in A(j)}\exp(v_j\cdot v_p / \tau)} \] where \( Z_T \) is all samples of the current task, \( P(j) \) is the set of positive samples of the same class as sample \( j \), \( A(j) \) is the set of all negative samples, and \( \tau \) is the temperature parameter. 2. **Balanced Classifier Learning Stage**: - Use Equalization Loss to readjust the sample weights to mitigate the impact of class imbalance. - The specific form of Equalization Loss is: \[ L_{\text{EQ}}(O_t(I_x))=\sum_{i = 1}^{k_{1:t}}-I_y(i)\sigma(\log(P(k_t))+O_t(I_x)) \] where \( I_x \) is the input image sample, \( I_y \) is the corresponding label, \( \sigma \) is the softmax function, and \( P(k_t) \) is the sample distribution vector of task \( t \). ### Multi - Sample Learning Strategy - To further balance the sample distribution in the batch, the paper proposes a multi - instance pairing strategy, that is, each training sample can be paired with multiple examples, thereby better reflecting the internal variability of the data and reducing the over - fitting of the model to specific training features. ### Experimental Verification - The paper conducts experiments on two datasets, CIFAR - 100 and VFN - LT, to verify the effectiveness of the DELTA framework. - The experimental results show that DELTA significantly outperforms existing OCL methods in online continual learning tasks under long - tailed distribution. Through these methods, the paper effectively solves the challenges of long - tailed distribution data in online continual learning and provides new solutions for practical applications.