Abstract:Multi-label classification poses challenges due to imbalanced and noisy labels in training data. We propose a unified data augmentation method, named BalanceMix, to address these challenges. Our approach includes two samplers for imbalanced labels, generating minority-augmented instances with high diversity. It also refines multi-labels at the label-wise granularity, categorizing noisy labels as clean, re-labeled, or ambiguous for robust optimization. Extensive experiments on three benchmark datasets demonstrate that BalanceMix outperforms existing state-of-the-art methods. We release the code at <a class="link-external link-https" href="https://github.com/DISL-Lab/BalanceMix" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper "Toward Robustness in Multi-label Classification: A Data Augmentation Strategy against Imbalance and Noise" aims to address two main issues in multi-label classification: **class imbalance** and **label noise**. 1. **Class Imbalance**: - **Inter-class Imbalance**: Some classes have significantly more positive labels than others, leading to the gradients of minority classes being ignored during optimization. - **Positive-Negative Imbalance**: Instances typically have fewer positive labels and a large number of negative labels, which also causes imbalance during the optimization process. 2. **Label Noise**: - **Incorrect Labels**: Labels may be incorrectly marked due to human annotation errors or system failures. - **Random Flipping**: Labels may be randomly flipped from positive to negative or vice versa. - **Partially Missing Labels**: Some class labels may be completely missing, which is more common in multi-label settings. ### Solution To address the above issues, the authors propose a new data augmentation method called **BalanceMix**. This method includes the following two main components: 1. **Minority-augmented Mixing**: - **Minority Class Sampler**: Increases the diversity of instances by sampling instances with minority class labels with high probability. - **Mixup Augmentation**: Interpolates and mixes instances sampled by the minority class sampler with those sampled by a random sampler to maintain diversity and avoid overfitting. 2. **Fine-grained Label-wise Management**: - **Clean Labels**: Identifies clean labels with low loss values using a bi-modal Gaussian Mixture Model (GMM). - **Re-labeling**: Re-labels instances that are not identified as clean labels but have high model prediction confidence. - **Fuzzy Labels**: Performs importance re-weighting on labels that are neither clean nor re-labeled to reduce potential risks. ### Experimental Results The authors conducted extensive experiments on three benchmark datasets, showing that **BalanceMix** outperforms existing state-of-the-art methods in handling class imbalance and label noise. Specifically, **BalanceMix** achieved a performance of 91.7 mAP on the MS-COCO dataset, which is the state-of-the-art performance using a ResNet backbone network. ### Conclusion By proposing the **BalanceMix** method, the paper effectively addresses the issues of class imbalance and label noise in multi-label classification, improving the robustness and performance of the model.

Toward Robustness in Multi-label Classification: A Data Augmentation Strategy against Imbalance and Noise

DoubleMix: Simple Interpolation-Based Data Augmentation for Text Classification

Noise-robust Oversampling for Imbalanced Data Classification

Multi-label Classification: Dealing with Imbalance by Combining Labels

RobustMixGen: Data augmentation for enhancing robustness of visual-language models in the presence of distribution shift

RC-Mixup: A Data Augmentation Strategy against Noisy Data for Regression Tasks

Balancing Label Imbalance in Federated Environments Using Only Mixup and Artificially-Labeled Noise

Addressing Imbalance in Weakly Supervised Multi-Label Learning

Noisy Label Classification using Label Noise Selection with Test-Time Augmentation Cross-Entropy and NoiseMix Learning

Data Augmentation Imbalance for Imbalanced Attribute Classification

Rebalancing Multi-Label Class-Incremental Learning

Mixture of Mixups for Multi-label Classification of Rare Anuran Sounds

A two-stage balancing strategy based on data augmentation for imbalanced text sentiment classification

ConfidentMix: Confidence-Guided Mixup for Learning With Noisy Labels

RandoMix: a mixed sample data augmentation method with multiple mixed modes

Smart data augmentation: One equation is all you need

Unveiling the Power of Mixup for Stronger Classiﬁers

Fine-Grained AutoAugmentation for Multi-Label Classification

Learning with Imbalanced Noisy Data by Preventing Bias in Sample Selection

ASMix: an Attention-based Smooth Data Augmentation Approach.

Towards Class-Imbalance Aware Multi-Label Learning