Abstract:Weakly supervised object localization (WSOL) aims to train instance-level locators by exploiting accessible image-level labels. By multiplying channel-wise features with classification weights and then adding them together, most prior works follow the pipeline of the Class Activation Map (CAM) to collect the semantic responses, thereby highlighting regions that contribute to class prediction to achieve WSOL. However, CAM-based methods treat the class contributions of all pixel positions in a channel equally and assign dominant weights for the discriminative channels biasedly. This fails to express the fine-grained pixel-level semantic response of each channel and model the complex contextual relations between channels, resulting in the mixup of the activation value between non-discriminative foreground regions and the background. To alleviate these issues, we present a Local Semantic activation enhancement and Global Spatial correlation mining network (LSGS-Net) for accurate WSOL. Specifically, we first propose a local activation generation module to explicitly learn the semantic response of each pixel position from channels. Then, we design a regularization loss to supervise the consistency between similar local activations, which utilizes the cross-image information to improve the accuracy of local activations. We further propose a K-nearest Neighbors graph module to capture the spatial correlation between different local activations, which can adaptively assign more proper weights when fusing all local activation. In the inference stage, the bounding box will be determined with a foreground threshold. Extensive experiments show that LSGS-Net achieves significant and consistent improvement with various backbones on the CUB, ILSVRC, and OpenImages benchmarks, with a 97.5% and 75.3% GT-Known LOC on CUB and ILSVRC, respectively. For segmentation quality on OpenImages, LSGS-Net already exceeds the SOTA method by 1.2% pIoU and 1.9% PxAP.

Spatial Continuity and Nonequal Importance in Salient Object Detection With Image-Category Supervision

Spatial Likelihood Voting with Self-Knowledge Distillation for Weakly Supervised Object Detection.

SLV: Spatial Likelihood Voting for Weakly Supervised Object Detection

Salient Object Detection Based on Visual Perceptual Saturation and Two-Stream Hybrid Networks.

Superpixel Consistency Saliency Map Generation for Weakly Supervised Semantic Segmentation of Remote Sensing Images

WUSL–SOD: Joint Weakly Supervised, Unsupervised and Supervised Learning for Salient Object Detection

Category-Aware Saliency Enhance Learning Based on CLIP for Weakly Supervised Salient Object Detection

Noise-Sensitive Adversarial Learning for Weakly Supervised Salient Object Detection

Salient Object Detection with Image-level Binary Supervision

Spatial Structure Constraints for Weakly Supervised Semantic Segmentation

Weakly Supervised Salient Object Detection Using Image Labels

Saliency Guided End-to-end Learning Forweakly Supervised Object Detection

Saliency Guided End-to-End Learning for Weakly Supervised Object Detection.

A Weakly Supervised Learning Framework for Salient Object Detection via Hybrid Labels

Salient Object Detection via Bounding-box Supervision

Self-Training-Based Semantic-Balanced Network for Weakly Supervised Object Detection in Remote-Sensing Images

SAM-Induced Pseudo Fully Supervised Learning for Weakly Supervised Object Detection in Remote Sensing Images

Spatial Self-Distillation for Object Detection with Inaccurate Bounding Boxes

Unsupervised Domain Adaptive Salient Object Detection Through Uncertainty-Aware Pseudo-Label Learning

Learning Local Semantic Region Activations for Weakly Supervised Object Localization