Segment Using Just One Example

Pratik Vora,Sudipan Saha
2024-08-14
Abstract:Semantic segmentation is an important topic in computer vision with many relevant application in Earth observation. While supervised methods exist, the constraints of limited annotated data has encouraged development of unsupervised approaches. However, existing unsupervised methods resemble clustering and cannot be directly mapped to explicit target classes. In this paper, we deal with single shot semantic segmentation, where one example for the target class is provided, which is used to segment the target class from query/test images. Our approach exploits recently popular Segment Anything (SAM), a promptable foundation model. We specifically design several techniques to automatically generate prompts from the only example/key image in such a way that the segmentation is successfully achieved on a stitch or concatenation of the example/key and query/test images. Proposed technique does not involve any training phase and just requires one example image to grasp the concept. Furthermore, no text-based prompt is required for the proposed method. We evaluated the proposed techniques on building and car classes.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
This paper aims to address the problem of one-shot semantic segmentation. Specifically, the paper proposes a method that can segment a target category in a query image using only one example image of the target category, without any training process. This method is particularly suitable for target detection scenarios that lack prior knowledge, such as quickly identifying specific targets in disaster management. The main contributions include: 1. **Problem Statement**: Investigated the problem of performing semantic segmentation on any query image using only one example image of a specific category. 2. **Solution**: Utilized the recently popular Segment Anything Model (SAM) to solve the aforementioned problem. Proposed several novel automatic prompt generation techniques to enhance the effectiveness of SAM. 3. **Evaluation**: Evaluated the proposed method on two categories: buildings and cars. While the choice of these two categories was mainly to demonstrate the challenge, the practical value of the method lies in handling unknown categories, especially in scenarios like disaster management. This method does not rely on a training phase, thereby reducing computational requirements and improving data efficiency, making it suitable for real-time applications in dynamic environments. Additionally, the method does not require text-based prompts and can complete the task using only a single example image. This is particularly useful for certain practical applications where describing the target may be difficult, while directly marking the target in an image is more intuitive.