Toward Robustness in Multi-label Classification: A Data Augmentation Strategy against Imbalance and Noise

Hwanjun Song,Minseok Kim,Jae-Gil Lee
DOI: https://doi.org/10.48550/arXiv.2312.07087
2023-12-12
Abstract:Multi-label classification poses challenges due to imbalanced and noisy labels in training data. We propose a unified data augmentation method, named BalanceMix, to address these challenges. Our approach includes two samplers for imbalanced labels, generating minority-augmented instances with high diversity. It also refines multi-labels at the label-wise granularity, categorizing noisy labels as clean, re-labeled, or ambiguous for robust optimization. Extensive experiments on three benchmark datasets demonstrate that BalanceMix outperforms existing state-of-the-art methods. We release the code at <a class="link-external link-https" href="https://github.com/DISL-Lab/BalanceMix" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper "Toward Robustness in Multi-label Classification: A Data Augmentation Strategy against Imbalance and Noise" aims to address two main issues in multi-label classification: **class imbalance** and **label noise**. 1. **Class Imbalance**: - **Inter-class Imbalance**: Some classes have significantly more positive labels than others, leading to the gradients of minority classes being ignored during optimization. - **Positive-Negative Imbalance**: Instances typically have fewer positive labels and a large number of negative labels, which also causes imbalance during the optimization process. 2. **Label Noise**: - **Incorrect Labels**: Labels may be incorrectly marked due to human annotation errors or system failures. - **Random Flipping**: Labels may be randomly flipped from positive to negative or vice versa. - **Partially Missing Labels**: Some class labels may be completely missing, which is more common in multi-label settings. ### Solution To address the above issues, the authors propose a new data augmentation method called **BalanceMix**. This method includes the following two main components: 1. **Minority-augmented Mixing**: - **Minority Class Sampler**: Increases the diversity of instances by sampling instances with minority class labels with high probability. - **Mixup Augmentation**: Interpolates and mixes instances sampled by the minority class sampler with those sampled by a random sampler to maintain diversity and avoid overfitting. 2. **Fine-grained Label-wise Management**: - **Clean Labels**: Identifies clean labels with low loss values using a bi-modal Gaussian Mixture Model (GMM). - **Re-labeling**: Re-labels instances that are not identified as clean labels but have high model prediction confidence. - **Fuzzy Labels**: Performs importance re-weighting on labels that are neither clean nor re-labeled to reduce potential risks. ### Experimental Results The authors conducted extensive experiments on three benchmark datasets, showing that **BalanceMix** outperforms existing state-of-the-art methods in handling class imbalance and label noise. Specifically, **BalanceMix** achieved a performance of 91.7 mAP on the MS-COCO dataset, which is the state-of-the-art performance using a ResNet backbone network. ### Conclusion By proposing the **BalanceMix** method, the paper effectively addresses the issues of class imbalance and label noise in multi-label classification, improving the robustness and performance of the model.