Learning to Detour: Shortcut Mitigating Augmentation for Weakly Supervised Semantic Segmentation

JuneHyoung Kwon,Eunju Lee,Yunsung Cho,YoungBin Kim
2024-05-28
Abstract:Weakly supervised semantic segmentation (WSSS) employing weak forms of labels has been actively studied to alleviate the annotation cost of acquiring pixel-level labels. However, classifiers trained on biased datasets tend to exploit shortcut features and make predictions based on spurious correlations between certain backgrounds and objects, leading to a poor generalization performance. In this paper, we propose shortcut mitigating augmentation (SMA) for WSSS, which generates synthetic representations of object-background combinations not seen in the training data to reduce the use of shortcut features. Our approach disentangles the object-relevant and background features. We then shuffle and combine the disentangled representations to create synthetic features of diverse object-background combinations. SMA-trained classifier depends less on contexts and focuses more on the target object when making predictions. In addition, we analyzed the behavior of the classifier on shortcut usage after applying our augmentation using an attribution method-based metric. The proposed method achieved the improved performance of semantic segmentation result on PASCAL VOC 2012 and MS COCO 2014 datasets.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
This paper focuses on a problem in Weakly Supervised Semantic Segmentation (WSSS) where, when using image-level class labels, classifiers tend to rely on background features as shortcuts due to dataset biases, resulting in predictions based on incorrect correlations and consequently affecting generalization performance. The paper proposes a method called Shortcut Mitigating Augmentation (SMA) to address this issue. SMA reduces the usage of shortcut features in two ways: firstly, it separates object-relevant and background features to avoid their confusion; secondly, it synthesizes diverse object-background combinations by randomly shuffling the combinations of background features in mini-batch data, making the classifier focus more on the target object rather than the background context. Through this approach, the classifier trained by SMA relies less on background shortcuts during prediction, resulting in more accurate generated localization maps. The paper also uses attribution methods to analyze the behavior of classifiers when using shortcuts after applying SMA, and demonstrates that SMA improves the performance of semantic segmentation results on the PASCAL VOC 2012 and MS COCO 2014 datasets. The experiments show that compared to existing data augmentation methods, SMA can more effectively reduce the usage of shortcuts and improve the quality of localization maps and pseudo-masks.