Prompt-and-Transfer: Dynamic Class-aware Enhancement for Few-shot Segmentation

Hanbo Bi,Yingchao Feng,Wenhui Diao,Peijin Wang,Yongqiang Mao,Kun Fu,Hongqi Wang,Xian Sun
2024-09-16
Abstract:For more efficient generalization to unseen domains (classes), most Few-shot Segmentation (FSS) would directly exploit pre-trained encoders and only fine-tune the decoder, especially in the current era of large models. However, such fixed feature encoders tend to be class-agnostic, inevitably activating objects that are irrelevant to the target class. In contrast, humans can effortlessly focus on specific objects in the line of sight. This paper mimics the visual perception pattern of human beings and proposes a novel and powerful prompt-driven scheme, called ``Prompt and Transfer" (PAT), which constructs a dynamic class-aware prompting paradigm to tune the encoder for focusing on the interested object (target class) in the current task. Three key points are elaborated to enhance the prompting: 1) Cross-modal linguistic information is introduced to initialize prompts for each task. 2) Semantic Prompt Transfer (SPT) that precisely transfers the class-specific semantics within the images to prompts. 3) Part Mask Generator (PMG) that works in conjunction with SPT to adaptively generate different but complementary part prompts for different individuals. Surprisingly, PAT achieves competitive performance on 4 different tasks including standard FSS, Cross-domain FSS (e.g., CV, medical, and remote sensing domains), Weak-label FSS, and Zero-shot Segmentation, setting new state-of-the-arts on 11 benchmarks.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in the Few - shot Segmentation (FSS) task, when dealing with unseen classes, existing methods use pre - trained encoders with fixed parameters, which makes these encoders class - agnostic and inevitably activates other classes that are not related to the target class. Specifically: 1. **Existing problems**: - Most current FSS methods directly use pre - trained encoders (such as ResNet - 50 pre - trained on ImageNet) and only fine - tune the decoder. Although this practice can prevent the encoder from being biased towards seen classes, it will make the encoder class - insensitive. - Such class - insensitive encoders will activate regions that are not related to the target class during the feature extraction process, increasing the burden on the subsequent decoder and affecting the segmentation effect. 2. **Inspiration from human visual perception**: - Humans can selectively focus on specific objects and ignore other irrelevant parts. Inspired by this, the author believes that an ideal feature encoder should be able to dynamically focus on objects of specific classes according to different tasks. 3. **Solutions**: - The paper proposes a new prompt - driven scheme, called "Prompt and Transfer" (PAT), which adjusts the encoder by dynamically generating class - aware prompts so that it can focus on the target class in the current task. - Specifically, PAT enhances the effect of prompts through the following three key points: 1. **Initializing prompts with cross - modal language information**: Introduce text information to initialize prompts for each task, making the prompts have initial class - awareness. 2. **Semantic Prompt Transfer (SPT)**: Precisely transfer the class - specific semantics in the image to the prompts to further enhance the class - awareness of the prompts. 3. **Part Mask Generator (PMG)**: Generate different but complementary local masks to meet the needs of different individuals, avoid prompt redundancy and fully explore local semantic cues. 4. **Effects**: - PAT has achieved excellent performance on multiple benchmark datasets in four different tasks (standard FSS, cross - domain FSS, weakly - labeled FSS and zero - shot segmentation), setting a new state - of - the - art level. In summary, this paper aims to solve the problem of encoder class - insensitivity in existing FSS methods. By introducing a dynamic class - aware prompt mechanism, it significantly improves the model's segmentation performance on unseen classes.