How Re-sampling Helps for Long-Tail Learning?

Jiang-Xin Shi,Tong Wei,Yuke Xiang,Yu-Feng Li

2023-10-28

Abstract:Long-tail learning has received significant attention in recent years due to the challenge it poses with extremely imbalanced datasets. In these datasets, only a few classes (known as the head classes) have an adequate number of training samples, while the rest of the classes (known as the tail classes) are infrequent in the training data. Re-sampling is a classical and widely used approach for addressing class imbalance issues. Unfortunately, recent studies claim that re-sampling brings negligible performance improvements in modern long-tail learning tasks. This paper aims to investigate this phenomenon systematically. Our research shows that re-sampling can considerably improve generalization when the training images do not contain semantically irrelevant contexts. In other scenarios, however, it can learn unexpected spurious correlations between irrelevant contexts and target labels. We design experiments on two homogeneous datasets, one containing irrelevant context and the other not, to confirm our findings. To prevent the learning of spurious correlations, we propose a new context shift augmentation module that generates diverse training images for the tail class by maintaining a context bank extracted from the head-class images. Experiments demonstrate that our proposed module can boost the generalization and outperform other approaches, including class-balanced re-sampling, decoupled classifier re-training, and data augmentation methods. The source code is available at <a class="link-external link-https" href="https://www.lamda.nju.edu.cn/code_CSA.ashx" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition,Machine Learning

What problem does this paper attempt to address?

The paper primarily explores the role and effectiveness of re-sampling methods in long-tail learning and proposes a new context transfer enhancement module to optimize re-sampling strategies. Specifically, the paper investigates the extreme imbalance problem present in long-tail datasets (i.e., where the number of samples for a few classes is far less than that for the majority classes). In such datasets, only a few classes (head classes) have sufficient training samples, while the remaining majority of classes (tail classes) have very limited samples. To address this imbalance issue, traditional re-sampling methods are widely adopted. However, recent studies have shown that in modern long-tail learning tasks, this simple approach does not lead to significant performance improvements. Therefore, this paper aims to systematically explore the mechanism of re-sampling. The study finds that re-sampling can significantly improve generalization ability when training images do not contain semantically irrelevant contexts; however, in the presence of irrelevant contexts, it may lead to unintended correlation learning. Based on these findings, the authors designed experiments to validate their theory and proposed a new context transfer enhancement module. This module maintains a context library extracted from head class images and transfers these contexts to tail class images, thereby generating diverse training samples and avoiding the learning of incorrect correlations. Ultimately, experiments demonstrate that the proposed module can effectively enhance the model's generalization ability and performance, outperforming other re-sampling, decoupled classifier retraining, and data augmentation methods.

How Re-sampling Helps for Long-Tail Learning?

Learning from Neighbors: Category Extrapolation for Long-Tail Learning

Exploring the auxiliary learning for long-tailed visual recognition

To Balance or Not to Balance: A Simple-yet-Effective Approach for Learning with Long-Tailed Distributions

Reviving Undersampling for Long-Tailed Learning

Increasing Oversampling Diversity for Long-Tailed Visual Recognition.

ECS-SC: Long-tailed classification via data augmentation based on easily confused sample selection and combination

Location-Based Scene Reconstruction for Long-Tail Recognition

SWRM: Similarity Window Reweighting and Margin for Long-Tailed Recognition

Enhanced multi-branch learning for long-tailed image recognition

Coarse-to-fine Knowledge Transfer Based Long-Tailed Classification Via Bilateral-Sampling Network

LTRL: Boosting Long-tail Recognition via Reflective Learning

A Systematic Review on Long-Tailed Learning

Combining Loss Reweighting and Sample Resampling for Long-Tailed Instance Segmentation

Feature Re-Balancing for Long-Tailed Visual Recognition.

Sonar Images Classification While Facing Long-Tail and Few-Shot.

Hybrid ResNet Based on Joint Basic and Attention Modules for Long-Tailed Classification

Tail Classes Matter: Long-Tailed Object Detection Revisited.

Long-tailed Visual Recognition with Deep Models: A Methodological Survey and Evaluation

Switching: understanding the class-reversed sampling in tail sample memorization

Revisiting Long-tailed Image Classification: Survey and Benchmarks with New Evaluation Metrics