Self-guided Few-shot Semantic Segmentation for Remote Sensing Imagery Based on Large Vision Models

Xiyu Qi,Yifan Wu,Yongqiang Mao,Wenhui Zhang,Yidan Zhang
2023-11-22
Abstract:The Segment Anything Model (SAM) exhibits remarkable versatility and zero-shot learning abilities, owing largely to its extensive training data (SA-1B). Recognizing SAM's dependency on manual guidance given its category-agnostic nature, we identified unexplored potential within few-shot semantic segmentation tasks for remote sensing imagery. This research introduces a structured framework designed for the automation of few-shot semantic segmentation. It utilizes the SAM model and facilitates a more efficient generation of semantically discernible segmentation outcomes. Central to our methodology is a novel automatic prompt learning approach, leveraging prior guided masks to produce coarse pixel-wise prompts for SAM. Extensive experiments on the DLRSD datasets underline the superiority of our approach, outperforming other available few-shot methodologies.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper aims to solve several key problems in semantic segmentation of remote sensing images: 1. **Dependence on Manual Guidance**: Traditional semantic segmentation methods highly depend on manual annotation when dealing with remote sensing images, which is not only time - consuming but also error - prone. The method proposed in this paper aims to reduce the dependence on manual intervention and achieve automatic segmentation. 2. **Application of Few - shot Learning**: Although existing large - scale vision models (such as Segment Anything Model, SAM) perform well in zero - shot learning, there are still deficiencies in the application of few - shot learning scenarios, especially on remote sensing images. By introducing a self - guided few - shot learning framework, this paper explores how to improve the segmentation performance of the model with limited labeled data. 3. **Automated Prompt Generation**: In order to overcome the class - agnostic feature of the SAM model, this paper proposes an automatic prompt learning technique, which uses prior - guided masks to generate rough pixel - level prompts, thereby guiding the model to perform more accurate segmentation. Specifically, the main contributions of this paper include: - Introducing a new few - shot semantic segmentation framework (Self - guided Large Vision Model, Few - shot SLVM), which significantly reduces the dependence on a large amount of manually labeled data. - Proposing an innovative "automatic prompt learning" technique, which uses pre - trained large - scale vision models (such as SAM) to generate rough pixel - level prompts and improves the accuracy of segmentation. - Through extensive experiments on the DLRSD dataset, verifying the superiority of this method in few - shot scenarios, especially in remote sensing image segmentation tasks. These contributions enable the framework proposed in this paper to complete the semantic segmentation task more efficiently and accurately when dealing with remote sensing images.