Can Continual Learning Improve Long-Tailed Recognition? Toward a Unified Framework

Mahdiyar Molahasani,Michael Greenspan,Ali Etemad
2023-06-23
Abstract:The Long-Tailed Recognition (LTR) problem emerges in the context of learning from highly imbalanced datasets, in which the number of samples among different classes is heavily skewed. LTR methods aim to accurately learn a dataset comprising both a larger Head set and a smaller Tail set. We propose a theorem where under the assumption of strong convexity of the loss function, the weights of a learner trained on the full dataset are within an upper bound of the weights of the same learner trained strictly on the Head. Next, we assert that by treating the learning of the Head and Tail as two separate and sequential steps, Continual Learning (CL) methods can effectively update the weights of the learner to learn the Tail without forgetting the Head. First, we validate our theoretical findings with various experiments on the toy MNIST-LT dataset. We then evaluate the efficacy of several CL strategies on multiple imbalanced variations of two standard LTR benchmarks (CIFAR100-LT and CIFAR10-LT), and show that standard CL methods achieve strong performance gains in comparison to baselines and approach solutions that have been tailor-made for LTR. We also assess the applicability of CL techniques on real-world data by exploring CL on the naturally imbalanced Caltech256 dataset and demonstrate its superiority over state-of-the-art classifiers. Our work not only unifies LTR and CL but also paves the way for leveraging advances in CL methods to tackle the LTR challenge more effectively.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address the issue of Long-Tailed Recognition (LTR). In real-world datasets, there is often a severe imbalance in the number of samples across different categories, where some categories (Head set) have significantly more samples than others (Tail set). This imbalance leads to a significant drop in the performance of deep learning models on the tail categories, despite good performance on the head categories. ### Main Contributions 1. **Theoretical Contributions**: - A theorem is proposed and proven, which states that if the loss function has strong convexity, the distance between the model weights trained on the complete dataset and those trained only on the head dataset is within a certain range. This range is inversely proportional to the imbalance factor of the dataset and directly proportional to the strong convexity parameter of the loss function. 2. **Methodological Innovations**: - Based on the above theorem, a new perspective is proposed, which decomposes the LTR problem into two consecutive tasks: first learning the head categories, then learning the tail categories. By utilizing Continual Learning (CL) methods, the model weights can be effectively updated to learn the tail categories without forgetting the head categories. 3. **Experimental Validation**: - A series of experiments were conducted using four datasets (MNIST-LT, CIFAR100-LT, CIFAR10-LT, and Caltech256) to validate the effectiveness of CL methods in addressing the LTR problem. The results show that standard CL methods can achieve significant performance improvements in long-tailed distribution scenarios, approaching or even surpassing methods specifically designed for LTR. ### Experimental Results 1. **Upper Bound Validation**: - On the MNIST-LT dataset, by varying the imbalance factor (IF) and the strong convexity parameter (µ), the upper bound derived from the theory was validated. The experimental results indicate that as IF or µ increases, the distance between the weights trained on the complete dataset and those trained only on the head dataset decreases, consistent with theoretical expectations. 2. **LTR Benchmark Testing**: - On the CIFAR100-LT and CIFAR10-LT datasets, three common CL strategies (LwF, EWC, and GPM) were applied and compared with existing LTR methods. The results show that CL methods significantly improve performance on the tail categories, although their performance on the head categories may be slightly inferior to some specifically designed LTR methods. 3. **Real-World Data**: - On the naturally imbalanced Caltech256 dataset, the improved EWC method was used for classification tasks. The results demonstrate that CL methods perform excellently in handling real-world long-tailed distribution data, surpassing the current state-of-the-art methods. ### Conclusion This paper not only proposes a theoretical framework that unifies LTR and CL but also experimentally validates the effectiveness of CL methods in addressing the long-tailed distribution problem. These findings provide new insights and methods for leveraging CL techniques to solve LTR issues.