Abstract:Research into Few-shot Semantic Segmentation (FSS) has attracted great attention, with the goal to segment target objects in a query image given only a few annotated support images of the target class. A key to this challenging task is to fully utilize the information in the support images by exploiting fine-grained correlations between the query and support images. However, most existing approaches either compressed the support information into a few class-wise prototypes, or used partial support information (e.g., only foreground) at the pixel level, causing non-negligible information loss. In this paper, we propose Dense pixel-wise Cross-query-and-support Attention weighted Mask Aggregation (DCAMA), where both foreground and background support information are fully exploited via multi-level pixel-wise correlations between paired query and support features. Implemented with the scaled dot-product attention in the Transformer architecture, DCAMA treats every query pixel as a token, computes its similarities with all support pixels, and predicts its segmentation label as an additive aggregation of all the support pixels’ labels—weighted by the similarities. Based on the unique formulation of DCAMA, we further propose efficient and effective one-pass inference for n-shot segmentation, where pixels of all support images are collected for the mask aggregation at once. Experiments show that our DCAMA significantly advances the state of the art on standard FSS benchmarks of PASCAL-5 $$^i$$ , COCO-20 $$^i$$ , and FSS-1000, e.g., with 3.1%, 9.7%, and 3.6% absolute improvements in 1-shot mIoU over previous best records. Ablative studies also verify the design DCAMA.

DCAM: Disturbed Class Activation Maps for Weakly Supervised Semantic Segmentation

Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation

Activation Modulation and Recalibration Scheme for Weakly Supervised Semantic Segmentation

Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation

Rethinking CAM in Weakly-Supervised Semantic Segmentation

Extracting Class Activation Maps from Non-Discriminative Features as well

Erase then Grow: Generating Correct Class Activation Maps for Weakly-Supervised Semantic Segmentation

Multi-Granularity Denoising and Bidirectional Alignment for Weakly Supervised Semantic Segmentation

Clustering-Guided Class Activation for Weakly Supervised Semantic Segmentation

Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation

Background Noise Reduction of Attention Map for Weakly Supervised Semantic Segmentation

SFC: Shared Feature Calibration in Weakly Supervised Semantic Segmentation

Partial Class Activation Attention for Semantic Segmentation

Spatial Structure Constraints for Weakly Supervised Semantic Segmentation

P-NOC: adversarial training of CAM generating networks for robust weakly supervised semantic segmentation priors

Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation

Self-Supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

Class-related Graph Convolution for Weakly Supervised Semantic Segmentation

Weakly supervised semantic segmentation based on superpixel affinity

Beyond Discriminative Regions: Saliency Maps as Alternatives to CAMs for Weakly Supervised Semantic Segmentation

Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation