RSAM-Seg: A SAM-based Approach with Prior Knowledge Integration for Remote Sensing Image Semantic Segmentation

Jie Zhang,Xubing Yang,Rui Jiang,Wei Shao,Li Zhang
2024-02-29
Abstract:The development of high-resolution remote sensing satellites has provided great convenience for research work related to remote sensing. Segmentation and extraction of specific targets are essential tasks when facing the vast and complex remote sensing images. Recently, the introduction of Segment Anything Model (SAM) provides a universal pre-training model for image segmentation tasks. While the direct application of SAM to remote sensing image segmentation tasks does not yield satisfactory results, we propose RSAM-Seg, which stands for Remote Sensing SAM with Semantic Segmentation, as a tailored modification of SAM for the remote sensing field and eliminates the need for manual intervention to provide prompts. Adapter-Scale, a set of supplementary scaling modules, are proposed in the multi-head attention blocks of the encoder part of SAM. Furthermore, Adapter-Feature are inserted between the Vision Transformer (ViT) blocks. These modules aim to incorporate high-frequency image information and image embedding features to generate image-informed prompts. Experiments are conducted on four distinct remote sensing scenarios, encompassing cloud detection, field monitoring, building detection and road mapping tasks . The experimental results not only showcase the improvement over the original SAM and U-Net across cloud, buildings, fields and roads scenarios, but also highlight the capacity of RSAM-Seg to discern absent areas within the ground truth of certain datasets, affirming its potential as an auxiliary annotation method. In addition, the performance in few-shot scenarios is commendable, underscores its potential in dealing with limited datasets.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The paper focuses on improving the general Segment Anything Model (SAM) for semantic segmentation tasks in remote sensing images. The existing SAM performs poorly in processing remote sensing images, so the authors propose RSAM-Seg, which stands for Remote Sensing SAM for semantic segmentation. It adapts to the characteristics of the remote sensing domain by introducing the Adapter-Scale and Adapter-Feature modules. These modules aim to capture high-frequency image information and embedded features, generate image hints, and reduce the reliance on manual intervention. RSAM-Seg is experimented on four different remote sensing scenes, including cloud detection, farmland monitoring, building detection, and road mapping. The results show that it outperforms the original SAM and U-Net in cloud, building, farmland, and road scenes, and can identify missing areas in certain datasets, demonstrating its potential as an auxiliary annotation method. Additionally, RSAM-Seg performs well even with a small number of samples, indicating its potential in handling limited datasets. The paper points out the challenges in remote sensing image segmentation, such as large intra-class variance and small inter-class variance in pixel values, as well as the quality and availability of annotated data. RSAM-Seg addresses these issues by integrating domain-specific prior knowledge, improving the model's adaptability to remote sensing image segmentation tasks. The effectiveness of RSAM-Seg is validated through experiments in multiple scenes.