DCAM: Disturbed Class Activation Maps for Weakly Supervised Semantic Segmentation

Jie Lei,Guoyu Yang,Shuaiwei Wang,Zunlei Feng,Ronghua Liang
DOI: https://doi.org/10.1016/j.jvcir.2023.103852
IF: 2.887
2023-01-01
Journal of Visual Communication and Image Representation
Abstract:In the field of weakly supervised semantic segmentation (WSSS), Class Activation Maps (CAM) are typically adopted to generate pseudo masks. Yet, we find that the crux of the unsatisfactory pseudo masks is the incomplete CAM. Specifically, as convolutional neural networks tend to be dominated by the specific regions in the high-confidence channels of feature maps during prediction, the extracted CAM contains only parts of the object. To address this issue, we propose the Disturbed CAM (DCAM), a simple yet effective method for WSSS. Following CAM, we adopt a binary cross-entropy (BCE) loss to train a multi-label classification model. Then, we disturb the feature map with retraining to enhance the high-confidence channels. In addition, a softmax cross-entropy (SCE) loss branch is employed to increase the model attention to the target classes. Once converged, we extract DCAM in the same way as in CAM. The evaluation on both PASCAL VOC and MS COCO shows that DCAM not only generates high-quality masks (6.2% and 1.4% higher than the benchmark models), but also enables more accurate activation in object regions. The code is available at https://github.com/gyyang23/DCAM.
What problem does this paper attempt to address?