Zero-Shot Refinement of Buildings' Segmentation Models using SAM

Ali Mayladan,Hasan Nasrallah,Hasan Moughnieh,Mustafa Shukor,Ali J. Ghandour

2024-02-11

Abstract:Foundation models have excelled in various tasks but are often evaluated on general benchmarks. The adaptation of these models for specific domains, such as remote sensing imagery, remains an underexplored area. In remote sensing, precise building instance segmentation is vital for applications like urban planning. While Convolutional Neural Networks (CNNs) perform well, their generalization can be limited. For this aim, we present a novel approach to adapt foundation models to address existing models' generalization dropback. Among several models, our focus centers on the Segment Anything Model (SAM), a potent foundation model renowned for its prowess in class-agnostic image segmentation capabilities. We start by identifying the limitations of SAM, revealing its suboptimal performance when applied to remote sensing imagery. Moreover, SAM does not offer recognition abilities and thus fails to classify and tag localized objects. To address these limitations, we introduce different prompting strategies, including integrating a pre-trained CNN as a prompt generator. This novel approach augments SAM with recognition abilities, a first of its kind. We evaluated our method on three remote sensing datasets, including the WHU Buildings dataset, the Massachusetts Buildings dataset, and the AICrowd Mapping Challenge. For out-of-distribution performance on the WHU dataset, we achieve a 5.47\% increase in IoU and a 4.81\% improvement in F1-score. For in-distribution performance on the WHU dataset, we observe a 2.72\% and 1.58\% increase in True-Positive-IoU and True-Positive-F1 score, respectively. Our code is publicly available at this Repo (<a class="link-external link-https" href="https://github.com/geoaigroup/GEOAI-ECRS2023" rel="external noopener nofollow">this https URL</a>), hoping to inspire further exploration of foundation models for domain-specific tasks within the remote sensing community.

Computer Vision and Pattern Recognition,Artificial Intelligence,Computation and Language,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to effectively utilize foundation models, especially Meta's Segment Anything Model (SAM), to improve the performance of building instance segmentation in remote sensing images. Specifically, the paper focuses on: 1. **Insufficient generalization ability of existing models**: Current state - of - the - art remote sensing models (such as CNN - based models) have insufficient generalization ability when dealing with images in different regions, seasons and periods due to large image variations. 2. **Limitations of the SAM model**: Although SAM performs excellently in image segmentation, it lacks recognition ability and cannot classify and label the segmented objects, especially its performance on remote sensing images is not as expected. To solve these problems, the author proposes a novel method to enhance the ability of the SAM model through prompt engineering, enabling it to better adapt to the building instance segmentation task in remote sensing images. Specific methods include: - **Integrating pre - trained CNN as a prompt generator**: Use a pre - trained CNN model to generate building segmentation masks and convert these masks into different prompt types (such as single - point, multi - point, bounding box, etc.) to guide SAM for more accurate segmentation. - **Exploring multiple prompt strategies**: Experiment with multiple different prompt strategies, including single - point prompts, multi - point prompts (random distribution and skeleton form), and bounding box prompts, to find the optimal combination. Through this method, the author hopes to significantly improve the accuracy and robustness of SAM in building instance segmentation in remote sensing images without retraining it. Experimental results show that on the WHU Buildings dataset, this method has achieved significant performance improvement, especially in metrics such as IoU and F1 - score.

Zero-Shot Refinement of Buildings' Segmentation Models using SAM

The Segment Anything Model (SAM) for Remote Sensing Applications: From Zero to One Shot

RSAM-Seg: A SAM-based Approach with Prior Knowledge Integration for Remote Sensing Image Semantic Segmentation

Leveraging Segment Anything Model in Identifying Buildings within Refugee Camps (SAM4Refugee) from Satellite Imagery for Humanitarian Operations

MeSAM: Multiscale Enhanced Segment Anything Model for Optical Remote Sensing Images

SAModified: A Foundation Model-Based Zero-Shot Approach for Refining Noisy Land-Use Land-Cover Maps

GeoSAM: Fine-tuning SAM with Multi-Modal Prompts for Mobility Infrastructure Segmentation

SAM Fails to Segment Anything? – SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and More

Accurate, automatic zero-shot wetland mapping from high resolution remote sensing imagery by prompting large foundation model (Segment Anything Model-SAM)

Segment anything, from space?

SAM-Adapter: Adapting Segment Anything in Underperformed Scenes

Evaluating the Efficacy of Segment Anything Model for Delineating Agriculture and Urban Green Spaces in Multiresolution Aerial and Spaceborne Remote Sensing Images

Tuning a SAM-Based Model with Multi-Cognitive Visual Adapter to Remote Sensing Instance Segmentation

Zero-Shot Segmentation of Eye Features Using the Segment Anything Model (SAM)

SAM-Assisted Remote Sensing Imagery Semantic Segmentation with Object and Boundary Constraints

Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models

SAM-RSIS: Progressively Adapting SAM With Box Prompting to Remote Sensing Image Instance Segmentation

Integrated Framework for Unsupervised Building Segmentation with Segment Anything Model-Based Pseudo-Labeling and Weakly Supervised Learning

Self-guided Few-shot Semantic Segmentation for Remote Sensing Imagery Based on Large Vision Models

Can SAM recognize crops? Quantifying the zero-shot performance of a semantic segmentation foundation model on generating crop-type maps using satellite imagery for precision agriculture