Abstract:Weakly supervised object localization (WSOL) aims to train instance-level locators by exploiting accessible image-level labels. By multiplying channel-wise features with classification weights and then adding them together, most prior works follow the pipeline of the Class Activation Map (CAM) to collect the semantic responses, thereby highlighting regions that contribute to class prediction to achieve WSOL. However, CAM-based methods treat the class contributions of all pixel positions in a channel equally and assign dominant weights for the discriminative channels biasedly. This fails to express the fine-grained pixel-level semantic response of each channel and model the complex contextual relations between channels, resulting in the mixup of the activation value between non-discriminative foreground regions and the background. To alleviate these issues, we present a Local Semantic activation enhancement and Global Spatial correlation mining network (LSGS-Net) for accurate WSOL. Specifically, we first propose a local activation generation module to explicitly learn the semantic response of each pixel position from channels. Then, we design a regularization loss to supervise the consistency between similar local activations, which utilizes the cross-image information to improve the accuracy of local activations. We further propose a K-nearest Neighbors graph module to capture the spatial correlation between different local activations, which can adaptively assign more proper weights when fusing all local activation. In the inference stage, the bounding box will be determined with a foreground threshold. Extensive experiments show that LSGS-Net achieves significant and consistent improvement with various backbones on the CUB, ILSVRC, and OpenImages benchmarks, with a 97.5% and 75.3% GT-Known LOC on CUB and ILSVRC, respectively. For segmentation quality on OpenImages, LSGS-Net already exceeds the SOTA method by 1.2% pIoU and 1.9% PxAP.

Proxy Probing Decoder for Weakly Supervised Object Localization: A Baseline Investigation

Spatial Likelihood Voting with Self-Knowledge Distillation for Weakly Supervised Object Detection.

SLV: Spatial Likelihood Voting for Weakly Supervised Object Detection

Hierarchical Saliency Mapping for Weakly Supervised Object Localization Based on Class Activation Mapping

Task-Aware Weakly Supervised Object Localization With Transformer

Learning Local Semantic Region Activations for Weakly Supervised Object Localization

LocLoc: Low-level Cues and Local-area Guides forWeakly Supervised Object Localization

LCTR: On Awakening the Local Continuity of Transformer for Weakly Supervised Object Localization

LocLoc: Low-level Cues and Local-area Guides for Weakly Supervised Object Localization

Weakly Supervised Object Localization Using Long-Range Semantic Foreground Activation

Weakly Supervised Object Localization As Domain Adaption.

Multi-scale discriminative Region Discovery for Weakly-Supervised Object Localization

Weakly supervised object localization via knowledge distillation based on foreground-background contrast

Reperceive Global Vision of Transformer for Remote Sensing Images Weakly Supervised Object Localization

Rethinking the Route Towards Weakly Supervised Object Localization

HiCT: Hierarchical Comprehend of Transformer for Weakly Supervised Object Localization

Strengthen Learning Tolerance for Weakly Supervised Object Localization

Spatial-Aware Token for Weakly Supervised Object Localization

Semantic-Constraint Matching Transformer for Weakly Supervised Object Localization

Localizing From Classification: Self-Directed Weakly Supervised Object Localization for Remote Sensing Images

Improving Weakly Supervised Object Localization Via Causal Intervention.