Abstract:Weakly supervised object localization (WSOL) aims to train instance-level locators by exploiting accessible image-level labels. By multiplying channel-wise features with classification weights and then adding them together, most prior works follow the pipeline of the Class Activation Map (CAM) to collect the semantic responses, thereby highlighting regions that contribute to class prediction to achieve WSOL. However, CAM-based methods treat the class contributions of all pixel positions in a channel equally and assign dominant weights for the discriminative channels biasedly. This fails to express the fine-grained pixel-level semantic response of each channel and model the complex contextual relations between channels, resulting in the mixup of the activation value between non-discriminative foreground regions and the background. To alleviate these issues, we present a Local Semantic activation enhancement and Global Spatial correlation mining network (LSGS-Net) for accurate WSOL. Specifically, we first propose a local activation generation module to explicitly learn the semantic response of each pixel position from channels. Then, we design a regularization loss to supervise the consistency between similar local activations, which utilizes the cross-image information to improve the accuracy of local activations. We further propose a K-nearest Neighbors graph module to capture the spatial correlation between different local activations, which can adaptively assign more proper weights when fusing all local activation. In the inference stage, the bounding box will be determined with a foreground threshold. Extensive experiments show that LSGS-Net achieves significant and consistent improvement with various backbones on the CUB, ILSVRC, and OpenImages benchmarks, with a 97.5% and 75.3% GT-Known LOC on CUB and ILSVRC, respectively. For segmentation quality on OpenImages, LSGS-Net already exceeds the SOTA method by 1.2% pIoU and 1.9% PxAP.

LocLoc: Low-level Cues and Local-area Guides for Weakly Supervised Object Localization

LocLoc: Low-level Cues and Local-area Guides forWeakly Supervised Object Localization

SLV: Spatial Likelihood Voting for Weakly Supervised Object Detection

Learning Local Semantic Region Activations for Weakly Supervised Object Localization

Spatial Likelihood Voting with Self-Knowledge Distillation for Weakly Supervised Object Detection.

Multi-scale discriminative Region Discovery for Weakly-Supervised Object Localization

Hierarchical Saliency Mapping for Weakly Supervised Object Localization Based on Class Activation Mapping

Rethinking the Route Towards Weakly Supervised Object Localization

Localizing From Classification: Self-Directed Weakly Supervised Object Localization for Remote Sensing Images

LocNet: Global Localization in 3D Point Clouds for Mobile Robots.

Strengthen Learning Tolerance for Weakly Supervised Object Localization

Generalized Weakly Supervised Object Localization

LCTR: On Awakening the Local Continuity of Transformer for Weakly Supervised Object Localization

EGSA: Enhanced and Global Semantic Activation for Weakly Supervised Object Localization.

Weakly Supervised Object Localization Using Long-Range Semantic Foreground Activation

Improving Weakly Supervised Object Localization Via Causal Intervention.

Proxy Probing Decoder for Weakly Supervised Object Localization: A Baseline Investigation

Weakly supervised object localization via knowledge distillation based on foreground-background contrast

Weakly Supervised Object Localization As Domain Adaption.

Task-Aware Weakly Supervised Object Localization With Transformer

CREAM: Weakly Supervised Object Localization via Class RE-Activation Mapping