Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation

Shijie Chang,Youwei Pang,Xiaoqi Zhao,Lihe Zhang,Huchuan Lu

2024-07-16

Abstract:Existing few-shot segmentation (FSS) methods mainly focus on prototype feature generation and the query-support matching mechanism. As a crucial prompt for generating prototype features, the pair of image-mask types in the support set has become the default setting. However, various types such as image, text, box, and mask all can provide valuable information regarding the objects in context, class, localization, and shape appearance. Existing work focuses on specific combinations of guidance, leading FSS into different research branches. Rethinking guidance types in FSS is expected to explore the efficient joint representation of the coupling between the support set and query set, giving rise to research trends in the weakly or strongly annotated guidance to meet the customized requirements of practical users. In this work, we provide the generalized FSS with seven guidance paradigms and develop a universal vision-language framework (UniFSS) to integrate prompts from text, mask, box, and image. Leveraging the advantages of large-scale pre-training vision-language models in textual and visual embeddings, UniFSS proposes high-level spatial correction and embedding interactive units to overcome the semantic ambiguity drawbacks typically encountered by pure visual matching methods when facing intra-class appearance diversities. Extensive experiments show that UniFSS significantly outperforms the state-of-the-art methods. Notably, the weakly annotated class-aware box paradigm even surpasses the finely annotated mask paradigm.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in the existing few - shot segmentation (FSS) methods, the prompt types in the support set are too single, mainly relying on image - mask pairs to generate prototype features. However, different prompt types such as images, texts, bounding boxes and masks can all provide valuable information about the context, category, location and shape appearance of the target object. Therefore, the author re - thinks the prompt types in FSS and proposes a general vision - language framework (UniFSS) to integrate prompt information from texts, masks, bounding boxes and images. Specifically, the main contributions of the paper include: 1. **Re - thinking prompt types**: This is the first time to systematically summarize and evaluate different forms of support set prompt types and attempt to integrate different task branches of FSS. 2. **Designing a general vision - language framework (UniFSS)**: This framework is compatible with vision - vision and vision - text correlations, combines the complementary advantages of appearance, shape, local and global context, and overcomes the challenges brought by intra - class diversity. As a general architecture, UniFSS can complete 7 task modes without modifying the model. 3. **Experimental results**: Demonstrating the prompting capabilities of different support set prompt types and their combinations for query images. UniFSS significantly outperforms the existing state - of - the - art algorithms on three popular FSS benchmark datasets, PASCAL - 5i, COCO - 20i and FSS - 1000. Through these improvements, the author hopes to promote the development of FSS and facilitate the unification of FSS tasks.

Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation

When Masked Image Modeling Meets Source-free Unsupervised Domain Adaptation: Dual-Level Masked Network for Semantic Segmentation

CGMGM: A Cross-Gaussian Mixture Generative Model for Few-Shot Semantic Segmentation

Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation

A Joint Framework Towards Class-aware and Class-agnostic Alignment for Few-shot Segmentation

Visual and Textual Prior Guided Mask Assemble for Few-Shot Segmentation and Beyond

MetaMask: Improving Few-Shot Semantic Segmentation Via Multi-Mask Calibriation

Masked Cross-image Encoding for Few-shot Segmentation

Adaptive FSS: A Novel Few-Shot Segmentation Framework Via Prototype Enhancement

Learning What Not to Segment: A New Perspective on Few-Shot Segmentation

RPMG-FSS: Robust Prior Mask Guided Few-Shot Semantic Segmentation

Iterative Few-shot Semantic Segmentation from Image Label Text

Few-Shot Segmentation Via Divide-and-Conquer Proxies

On filling the intra-class and inter-class gaps for few-shot segmentation

Self-Support Few-Shot Semantic Segmentation

Bridge the Points: Graph-based Few-shot Segment Anything Semantically

Mining Latent Classes for Few-shot Segmentation

High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study

FGN: Fully Guided Network for Few-Shot Instance Segmentation

Reflection Invariance Learning for Few-shot Semantic Segmentation

No Re-Train, More Gain: Upgrading Backbones with Diffusion Model for Few-Shot Segmentation