SMCL: Saliency Masked Contrastive Learning for Long-tailed Recognition

Sanglee Park,Seung-won Hwang,Jungmin So
DOI: https://doi.org/10.1109/ICASSP49357.2023.10097143
2024-06-04
Abstract:Real-world data often follow a long-tailed distribution with a high imbalance in the number of samples between classes. The problem with training from imbalanced data is that some background features, common to all classes, can be unobserved in classes with scarce samples. As a result, this background correlates to biased predictions into ``major" classes. In this paper, we propose saliency masked contrastive learning, a new method that uses saliency masking and contrastive learning to mitigate the problem and improve the generalizability of a model. Our key idea is to mask the important part of an image using saliency detection and use contrastive learning to move the masked image towards minor classes in the feature space, so that background features present in the masked image are no longer correlated with the original class. Experiment results show that our method achieves state-of-the-art level performance on benchmark long-tailed datasets.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the problem of visual recognition in long-tailed distribution datasets. In real-world datasets, the number of samples in each category varies greatly, forming a long-tailed distribution. This imbalance makes it difficult for the model to correctly learn distinguishing features, especially in minority classes (categories with fewer samples). Specifically, background features should be common across different categories, but during training, due to the lack of minority class samples, these background features may be incorrectly associated with certain majority classes (categories with more samples), leading the model to bias towards these majority classes during prediction. To solve this problem, the paper proposes a new method called **Saliency Masked Contrastive Learning (SMCL)**. This method is implemented through the following steps: 1. **Saliency Masking**: Use saliency detection techniques to mask out important parts of the image, thereby retaining the background parts. 2. **Weighted Sampling**: Prioritize minority classes when selecting target labels to increase the selection probability of minority class samples. 3. **Contrastive Learning**: Use contrastive learning to align the features of masked images towards minority classes, achieving the sharing of background features in the feature space, so they no longer bias towards majority classes. Experimental results show that SMCL achieves state-of-the-art performance on multiple long-tailed benchmark datasets, with significant improvements in the performance of minority classes.