Abstract:Weakly supervised object localization (WSOL) aims to train instance-level locators by exploiting accessible image-level labels. By multiplying channel-wise features with classification weights and then adding them together, most prior works follow the pipeline of the Class Activation Map (CAM) to collect the semantic responses, thereby highlighting regions that contribute to class prediction to achieve WSOL. However, CAM-based methods treat the class contributions of all pixel positions in a channel equally and assign dominant weights for the discriminative channels biasedly. This fails to express the fine-grained pixel-level semantic response of each channel and model the complex contextual relations between channels, resulting in the mixup of the activation value between non-discriminative foreground regions and the background. To alleviate these issues, we present a Local Semantic activation enhancement and Global Spatial correlation mining network (LSGS-Net) for accurate WSOL. Specifically, we first propose a local activation generation module to explicitly learn the semantic response of each pixel position from channels. Then, we design a regularization loss to supervise the consistency between similar local activations, which utilizes the cross-image information to improve the accuracy of local activations. We further propose a K-nearest Neighbors graph module to capture the spatial correlation between different local activations, which can adaptively assign more proper weights when fusing all local activation. In the inference stage, the bounding box will be determined with a foreground threshold. Extensive experiments show that LSGS-Net achieves significant and consistent improvement with various backbones on the CUB, ILSVRC, and OpenImages benchmarks, with a 97.5% and 75.3% GT-Known LOC on CUB and ILSVRC, respectively. For segmentation quality on OpenImages, LSGS-Net already exceeds the SOTA method by 1.2% pIoU and 1.9% PxAP.

Ml-Locnet: Improving Object Localization With Multi-View Learning Network

SLV: Spatial Likelihood Voting for Weakly Supervised Object Detection

Spatial Likelihood Voting with Self-Knowledge Distillation for Weakly Supervised Object Detection.

Learning Local Semantic Region Activations for Weakly Supervised Object Localization

Hierarchical Saliency Mapping for Weakly Supervised Object Localization Based on Class Activation Mapping

Weakly supervised object localization via knowledge distillation based on foreground-background contrast

LocLoc: Low-level Cues and Local-area Guides for Weakly Supervised Object Localization

Strengthen Learning Tolerance for Weakly Supervised Object Localization

LocLoc: Low-level Cues and Local-area Guides forWeakly Supervised Object Localization

LocNet: Global Localization in 3D Point Clouds for Mobile Robots.

Multi-scale discriminative Region Discovery for Weakly-Supervised Object Localization

Rethinking the Route Towards Weakly Supervised Object Localization

Improving Weakly Supervised Object Localization Via Causal Intervention.

Weakly Supervised Object Localization As Domain Adaption.

Generalized Weakly Supervised Object Localization

EGSA: Enhanced and Global Semantic Activation for Weakly Supervised Object Localization.

Weakly Supervised Object Localization Using Long-Range Semantic Foreground Activation

Localizing From Classification: Self-Directed Weakly Supervised Object Localization for Remote Sensing Images

SFCM: Learn a Pooling Kernel for Weakly Supervised Object Localization

Pairwise Similarity Knowledge Transfer for Weakly Supervised Object Localization

MOL: Towards Accurate Weakly Supervised Remote Sensing Object Detection Via Multi-view Noisy Learning