Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts

Pasquale De Marinis,Nicola Fanelli,Raffaele Scaringi,Emanuele Colonna,Giuseppe Fiameni,Gennaro Vessio,Giovanna Castellano
2024-07-02
Abstract:We present Label Anything, an innovative neural network architecture designed for few-shot semantic segmentation (FSS) that demonstrates remarkable generalizability across multiple classes with minimal examples required per class. Diverging from traditional FSS methods that predominantly rely on masks for annotating support images, Label Anything introduces varied visual prompts -- points, bounding boxes, and masks -- thereby enhancing the framework's versatility and adaptability. Unique to our approach, Label Anything is engineered for end-to-end training across multi-class FSS scenarios, efficiently learning from diverse support set configurations without retraining. This approach enables a "universal" application to various FSS challenges, ranging from $1$-way $1$-shot to complex $N$-way $K$-shot configurations while remaining agnostic to the specific number of class examples. This innovative training strategy reduces computational requirements and substantially improves the model's adaptability and generalization across diverse segmentation tasks. Our comprehensive experimental validation, particularly achieving state-of-the-art results on the COCO-$20^i$ benchmark, underscores Label Anything's robust generalization and flexibility. The source code is publicly available at: <a class="link-external link-https" href="https://github.com/pasqualedem/LabelAnything" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve several key problems in few - shot semantic segmentation (FSS): 1. **Multi - class few - shot segmentation**: - Traditional FSS methods mainly focus on binary classification tasks, that is, distinguishing between the background and the foreground. However, in practical applications, segmentation tasks of multiple classes often need to be dealt with. This paper proposes an FSS method that can handle multiple classes simultaneously, thereby improving the flexibility and universality of the model. 2. **Diverse visual cues**: - Traditional FSS methods usually rely on masks as the annotation method for the support set. This paper introduces multiple visual cues, including points, bounding boxes and masks, thus enhancing the diversity and adaptability of the framework. 3. **End - to - end training**: - The method proposed in this paper can learn different support set configurations and cue types in a single training without retraining. This enables the model to be flexibly applied in different FSS scenarios, from 1 - way 1 - shot to complex N - way K - shot configurations, and is not limited by the specific number of classes. 4. **Reducing the annotation burden**: - By using diverse visual cues, the method in this paper reduces the dependence on a large amount of annotated data, thereby reducing the annotation cost and time. 5. **Improving the generalization ability of the model**: - The method in this paper has achieved state - of - the - art results in the COCO - 20i benchmark test, demonstrating its strong generalization ability and flexibility in dealing with unseen classes. ### Summary The Label Anything (LA) model proposed in this paper solves the key challenges in multi - class few - shot semantic segmentation by introducing diverse visual cues and an end - to - end training strategy, significantly improving the flexibility, adaptability and generalization ability of the model. These innovations are not only technically breakthroughs, but also of great significance in practical applications, especially when the annotation resources are limited.