Abstract:Few-shot segmentation aims at learning to segment query images guided by only a few annotated images from the support set. Previous methods rely on mining the feature embedding similarity across the query and the support images to achieve successful segmentation. However, these models tend to perform badly in cases where the query instances have a large variance from the support ones. To enhance model robustness against such intra-class variance, we propose a Double Recalibration Network (DRNet) with two recalibration modules, i.e., the Self-adapted Recalibration (SR) module and the Cross-attended Recalibration (CR) module. In particular, beyond learning robust feature embedding for pixel-wise comparison between support and query as in conventional methods, the DRNet further exploits semantic-aware knowledge embedded in the query image to help segment itself, which we call 'self-adapted recalibration'. More specifically, DRNet first employs guidance from the support set to roughly predict an incomplete but correct initial object region for the query image, and then reversely uses the feature embedding extracted from the incomplete object region to segment the query image. Also, we devise a CR module to refine the feature representation of the query image by propagating the underlying knowledge embedded in the support image's foreground to the query. Instead of foreground global pooling, we refine the response at each pixel in the query feature map by attending to all foreground pixels in the support feature map and taking the weighted average by their similarity; meanwhile, feature maps of the query image are also added back to weighted feature maps as a residual connection. Our DRNet can effectively address the intra-class variance under the few-shot setting with such two recalibration modules, and mine more accurate target regions for query images. We conduct extensive experiments on the popular benchmarks PASCAL-5(i) and COCO-20(i). The DRNet with the best configuration achieves the mIoU of 63.6% and 64.9% on PASCAL-5(i) and 44.7% and 49.6% on COCO-20(i) for 1-shot and 5-shot settings respectively, significantly outperforming the state-of-the-arts without any bells and whistles. Code is available at: https://github.com/fangzy97/drnet.

CRNet: Collaborative Refinement Network for Self-Supervised Video Object Segmentation

Self Supervised Progressive Network for High Performance Video Object Segmentation

Motion-Guided Cascaded Refinement Network for Video Object Segmentation

Self-Supervised Deep TripleNet for Video Object Segmentation

Video Object Segmentation via Structural Feature Reconfiguration

Self-supervised Video Object Segmentation Using Integration-Augmented Attention

CRNet: Cross-Reference Networks for Few-Shot Segmentation

Spatiotemporal Graph Neural Network Based Mask Reconstruction for Video Object Segmentation

Semi-supervised Video Object Segmentation Via an Edge Attention Gated Graph Convolutional Network

DRNet: Double Recalibration Network for Few-Shot Semantic Segmentation

Towards Robust Video Object Segmentation with Adaptive Object Calibration

Semi-supervised Video Object Segmentation with Recurrent Neural Network

Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation

Enhanced Memory Network for Video Segmentation

Attention-Guided Network for Semantic Video Segmentation

Robust and Efficient Memory Network for Video Object Segmentation

Reliability-Hierarchical Memory Network for Scribble-Supervised Video Object Segmentation

Fast Video Object Segmentation Via Dynamic Targeting Network

Dual Cross-Attention for Video Object Segmentation Via Uncertainty Refinement

Full-duplex strategy for video object segmentation

RANet: Ranking Attention Network for Fast Video Object Segmentation