Abstract:We present Label Anything, an innovative neural network architecture designed for few-shot semantic segmentation (FSS) that demonstrates remarkable generalizability across multiple classes with minimal examples required per class. Diverging from traditional FSS methods that predominantly rely on masks for annotating support images, Label Anything introduces varied visual prompts -- points, bounding boxes, and masks -- thereby enhancing the framework's versatility and adaptability. Unique to our approach, Label Anything is engineered for end-to-end training across multi-class FSS scenarios, efficiently learning from diverse support set configurations without retraining. This approach enables a "universal" application to various FSS challenges, ranging from $1$-way $1$-shot to complex $N$-way $K$-shot configurations while remaining agnostic to the specific number of class examples. This innovative training strategy reduces computational requirements and substantially improves the model's adaptability and generalization across diverse segmentation tasks. Our comprehensive experimental validation, particularly achieving state-of-the-art results on the COCO-$20^i$ benchmark, underscores Label Anything's robust generalization and flexibility. The source code is publicly available at: <a class="link-external link-https" href="https://github.com/pasqualedem/LabelAnything" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve several key problems in few - shot semantic segmentation (FSS): 1. **Multi - class few - shot segmentation**: - Traditional FSS methods mainly focus on binary classification tasks, that is, distinguishing between the background and the foreground. However, in practical applications, segmentation tasks of multiple classes often need to be dealt with. This paper proposes an FSS method that can handle multiple classes simultaneously, thereby improving the flexibility and universality of the model. 2. **Diverse visual cues**: - Traditional FSS methods usually rely on masks as the annotation method for the support set. This paper introduces multiple visual cues, including points, bounding boxes and masks, thus enhancing the diversity and adaptability of the framework. 3. **End - to - end training**: - The method proposed in this paper can learn different support set configurations and cue types in a single training without retraining. This enables the model to be flexibly applied in different FSS scenarios, from 1 - way 1 - shot to complex N - way K - shot configurations, and is not limited by the specific number of classes. 4. **Reducing the annotation burden**: - By using diverse visual cues, the method in this paper reduces the dependence on a large amount of annotated data, thereby reducing the annotation cost and time. 5. **Improving the generalization ability of the model**: - The method in this paper has achieved state - of - the - art results in the COCO - 20i benchmark test, demonstrating its strong generalization ability and flexibility in dealing with unseen classes. ### Summary The Label Anything (LA) model proposed in this paper solves the key challenges in multi - class few - shot semantic segmentation by introducing diverse visual cues and an end - to - end training strategy, significantly improving the flexibility, adaptability and generalization ability of the model. These innovations are not only technically breakthroughs, but also of great significance in practical applications, especially when the annotation resources are limited.

Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts

Iterative Few-shot Semantic Segmentation from Image Label Text

ADeLA: Automatic Dense Labeling with Attention for Viewpoint Shift in Semantic Segmentation

Few-shot Class-Incremental Semantic Segmentation via Pseudo-Labeling and Knowledge Distillation

Segment Any Class (SAC): Multi-Class Few-Shot Semantic Segmentation via Class Region Proposals

Self-supervised Few-shot Learning for Semantic Segmentation: An Annotation-free Approach

MetaMask: Improving Few-Shot Semantic Segmentation Via Multi-Mask Calibriation

MSDNet: Multi-Scale Decoder for Few-Shot Semantic Segmentation via Transformer-Guided Prototyping

Dual Branch Multi-Level Semantic Learning for Few-Shot Segmentation

Make a Strong Teacher with Label Assistance: A Novel Knowledge Distillation Approach for Semantic Segmentation

A Novel Benchmark for Few-Shot Semantic Segmentation in the Era of Foundation Models

Break the Bias: Delving Semantic Transform Invariance for Few-Shot Segmentation

Self-Support Matching Networks with Multiscale Attention for Few-shot Semantic Segmentation

Objectness-Aware Few-Shot Semantic Segmentation

Bridge the Points: Graph-based Few-shot Segment Anything Semantically

Label-Efficient Few-Shot Semantic Segmentation with Unsupervised Meta-Training

CAFS: Class Adaptive Framework for Semi-Supervised Semantic Segmentation

Sam-Rsp: A New Few-Shot Segmentation Method Based on Segment Anything Model and Rough Segmentation Prompts

APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation

SSA-Seg: Semantic and Spatial Adaptive Pixel-level Classifier for Semantic Segmentation

Better Call SAL: Towards Learning to Segment Anything in Lidar