Abstract:Weakly Supervised Semantic Segmentation (WSSS) with only image-level labels reduces the annotation burden and has been rapidly developed in recent years. However, current mainstream methods only employ a single image's information to localize the target and do not account for the relationships across images. When faced with Remote Sensing (RS) images, limited to complex backgrounds and multiple categories, it is challenging to locate and differentiate between the categories of targets. As opposed to previous methods that mostly focused on single-image information, we propose CISM, a novel cross-image semantic mining WSSS framework. CISM explores cross-image semantics in multi-category RS scenes for the first time with two novel loss functions: the Common Semantic Mining (CSM) loss and the Non-common Semantic Contrastive (NSC) loss. In particular, prototype vectors and the Prototype Interactive Enhancement (PIE) module were employed to capture semantic similarity and differences across images. To overcome category confusions and closely related background interferences, we integrated the Single-Label Secondary Classification (SLSC) task and the corresponding single-label loss into our framework. Furthermore, a Multi-Category Sample Generation (MCSG) strategy was devised to balance the distribution of samples among various categories and drastically increase the diversity of images. The above designs facilitated the generation of more accurate and higher-granularity Class Activation Maps (CAMs) for each category of targets. Our approach is superior to the RS dataset based on extensive experiments and is the first WSSS framework to explore cross-image semantics in multi-category RS scenes and obtain cutting-edge state-of-the-art results on the iSAID dataset by only using image-level labels. Experiments on the PASCAL VOC2012 dataset also demonstrated the effectiveness and competitiveness of the algorithm, which pushes the mean Intersection-Over-Union (mIoU) to 67.3% and 68.5% on the validation and test sets of PASCAL VOC2012, respectively.

PCSformer: Pair-wise Cross-scale Sub-prototypes Mining with CNN-transformers for Weakly Supervised Semantic Segmentation

MECPformer: Multi-estimations Complementary Patch with CNN-Transformers for Weakly Supervised Semantic Segmentation

APC: Adaptive Patch Contrast for Weakly Supervised Semantic Segmentation

PSSD-Transformer: Powerful Sparse Spike-Driven Transformer for Image Semantic Segmentation

Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation

Weakly Supervised Semantic Segmentation in Aerial Imagery via Cross-Image Semantic Mining

Progressive Feature Self-reinforcement for Weakly Supervised Semantic Segmentation

Boosting Weakly-Supervised Image Segmentation Via Representation, Transform, and Compensator

Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo Labeling and Multi-scale Feature Grouping

Dual-Augmented Transformer Network for Weakly Supervised Semantic Segmentation

Token Contrast for Weakly-Supervised Semantic Segmentation

Max Pooling with Vision Transformers reconciles class and shape in weakly supervised semantic segmentation

ProCNS: Progressive Prototype Calibration and Noise Suppression for Weakly-Supervised Medical Image Segmentation

CoBra: Complementary Branch Fusing Class and Semantic Knowledge for Robust Weakly Supervised Semantic Segmentation

Weakly-Supervised Semantic Segmentation with Visual Words Learning and Hybrid Pooling

MCTformer+: Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation

Out-of-Candidate Rectification for Weakly Supervised Semantic Segmentation

Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation

Enhancing Weakly Supervised Semantic Segmentation with Multi-modal Foundation Models: An End-to-End Approach

Complete Instances Mining for Weakly Supervised Instance Segmentation

SSA-Seg: Semantic and Spatial Adaptive Pixel-level Classifier for Semantic Segmentation