How Re-sampling Helps for Long-Tail Learning?

Jiang-Xin Shi,Tong Wei,Yuke Xiang,Yu-Feng Li
2023-10-28
Abstract:Long-tail learning has received significant attention in recent years due to the challenge it poses with extremely imbalanced datasets. In these datasets, only a few classes (known as the head classes) have an adequate number of training samples, while the rest of the classes (known as the tail classes) are infrequent in the training data. Re-sampling is a classical and widely used approach for addressing class imbalance issues. Unfortunately, recent studies claim that re-sampling brings negligible performance improvements in modern long-tail learning tasks. This paper aims to investigate this phenomenon systematically. Our research shows that re-sampling can considerably improve generalization when the training images do not contain semantically irrelevant contexts. In other scenarios, however, it can learn unexpected spurious correlations between irrelevant contexts and target labels. We design experiments on two homogeneous datasets, one containing irrelevant context and the other not, to confirm our findings. To prevent the learning of spurious correlations, we propose a new context shift augmentation module that generates diverse training images for the tail class by maintaining a context bank extracted from the head-class images. Experiments demonstrate that our proposed module can boost the generalization and outperform other approaches, including class-balanced re-sampling, decoupled classifier re-training, and data augmentation methods. The source code is available at <a class="link-external link-https" href="https://www.lamda.nju.edu.cn/code_CSA.ashx" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The paper primarily explores the role and effectiveness of re-sampling methods in long-tail learning and proposes a new context transfer enhancement module to optimize re-sampling strategies. Specifically, the paper investigates the extreme imbalance problem present in long-tail datasets (i.e., where the number of samples for a few classes is far less than that for the majority classes). In such datasets, only a few classes (head classes) have sufficient training samples, while the remaining majority of classes (tail classes) have very limited samples. To address this imbalance issue, traditional re-sampling methods are widely adopted. However, recent studies have shown that in modern long-tail learning tasks, this simple approach does not lead to significant performance improvements. Therefore, this paper aims to systematically explore the mechanism of re-sampling. The study finds that re-sampling can significantly improve generalization ability when training images do not contain semantically irrelevant contexts; however, in the presence of irrelevant contexts, it may lead to unintended correlation learning. Based on these findings, the authors designed experiments to validate their theory and proposed a new context transfer enhancement module. This module maintains a context library extracted from head class images and transfers these contexts to tail class images, thereby generating diverse training samples and avoiding the learning of incorrect correlations. Ultimately, experiments demonstrate that the proposed module can effectively enhance the model's generalization ability and performance, outperforming other re-sampling, decoupled classifier retraining, and data augmentation methods.