IFSENet : Harnessing Sparse Iterations for Interactive Few-shot Segmentation Excellence

Shreyas Chandgothia,Ardhendu Sekhar,Amit Sethi
2024-03-22
Abstract:Training a computer vision system to segment a novel class typically requires collecting and painstakingly annotating lots of images with objects from that class. Few-shot segmentation techniques reduce the required number of images to learn to segment a new class, but careful annotations of object boundaries are still required. On the other hand, interactive segmentation techniques only focus on incrementally improving the segmentation of one object at a time (typically, using clicks given by an expert) in a class-agnostic manner. We combine the two concepts to drastically reduce the effort required to train segmentation models for novel classes. Instead of trivially feeding interactive segmentation masks as ground truth to a few-shot segmentation model, we propose IFSENet, which can accept sparse supervision on a single or few support images in the form of clicks to generate masks on support (training, at least clicked upon once) as well as query (test, never clicked upon) images. To trade-off effort for accuracy flexibly, the number of images and clicks can be incrementally added to the support set to further improve the segmentation of support as well as query images. The proposed model approaches the accuracy of previous state-of-the-art few-shot segmentation models with considerably lower annotation effort (clicks instead of maps), when tested on Pascal and SBD datasets on query images. It also works well as an interactive segmentation method on support images.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper proposes a new model framework called IFSENet (Interactive Few-Shot Segmentation Excellence Network), which aims to combine the advantages of interactive segmentation and few-shot segmentation to reduce the annotation workload required for training segmentation models. Specifically, the goal of IFSENet is to perform image segmentation with a few samples (few-shot) and generate segmentation masks using clicks provided by the user as sparse supervision. Unlike traditional few-shot segmentation methods that require high-quality support masks, IFSENet only needs the user to provide clicks on the support image to indicate the target area. This allows the model not only to segment unseen categories but also to iteratively improve segmentation quality by adding more clicks or support images. The key contributions of IFSENet are: 1. **Combining two methods**: It combines interactive segmentation with few-shot segmentation, using clicks as input to guide the model in learning how to segment new categories. 2. **Handling support and query images**: The model can simultaneously handle support images and support clicks, as well as query images that have never been clicked, enabling effective segmentation of new categories. 3. **Iterative improvement**: The model supports users in continuously improving segmentation quality by adding more clicks or support images. 4. **Performance validation**: Experimental results show that IFSENet performs comparably to previous best models on the Pascal and SBD datasets, but with significantly reduced annotation workload. In summary, IFSENet achieves efficient segmentation of new categories by integrating interactive segmentation and few-shot segmentation techniques, while greatly reducing the manual annotation workload.