Abstract:Object attention maps generated by image classifiers are usually used as priors for weakly supervised semantic segmentation. However, attention maps usually locate the most discriminative object parts. The lack of integral object localization maps heavily limits the performance of weakly supervised segmentation approaches. This paper attempts to investigate a novel way to identify entire object regions in a weakly supervised manner. We observe that image classifiers' attention maps at different training phases may focus on different parts of the target objects. Based on this observation, we propose an online attention accumulation (OAA) strategy that utilizes the attention maps at different training phases to obtain more integral object regions. Specifically, we maintain a cumulative attention map for each target category in each training image and utilize it to record the discovered object regions at different training phases. Albeit OAA can effectively mine more object regions for most images, for some training images, the range of the attention movement is not large, limiting the generation of integral object attention regions. To overcome this problem, we propose incorporating an attention drop layer into the online attention accumulation process to enlarge the range of attention movement during training explicitly. Our method (OAA) can be plugged into any classification network and progressively accumulate the discriminative regions into cumulative attention maps as the training process goes. Additionally, we also explore utilizing the final cumulative attention maps to serve as the pixel-level supervision, which can further assist the network in discovering more integral object regions. When applying the resulting attention maps to the weakly supervised semantic segmentation task, our approach improves the existing state-of-the-art methods on the PASCAL VOC 2012 segmentation benchmark, achieving a mIoU score of 67.2 percent on the test set.

End-to-end Semantic-Aware Object Retrieval Based on Region-Wise Attention

Learning Feature Embedding with Strong Neural Activations for Fine-Grained Retrieval

Object-Based Image Retrieval With Attention Analysis And Spatial Re-Ranking

Semantic Segmentation With Attention Mechanism for Remote Sensing Images

Online Attention Accumulation for Weakly Supervised Semantic Segmentation

Dual-attention-transformer-based semantic reranking for large-scale image localization

Semantic enhancement and multi-level alignment network for cross-modal retrieval

End-to-end Semantic Object Detection with Cross-Modal Alignment

Visual Content Recognition by Exploiting Semantic Feature Map with Attention and Multi-task Learning

End-to-End Instance Segmentation with Recurrent Attention

Semantic-aware scene recognition

A Synergistical Attention Model for Semantic Segmentation of Remote Sensing Images

Region-based semantic segmentation with end-to-end training

Semantic Image Segmentation Based On Attentions To Intra Scales And Inner Channels

Attention-based Natural Language Person Retrieval

Semantic Aware Attention Based Deep Object Co-segmentation

Salient Object Ranking with Position-Preserved Attention

Semantic Image Segmentation with Improved Position Attention and Feature Fusion

Adversarial Soft-detection-based Aggregation Network for Image Retrieval

An Object-Aware Network Embedding Deep Superpixel for Semantic Segmentation of Remote Sensing Images

Weakly Supervised Soft-detection-based Aggregation Method for Image Retrieval.