Zero-Shot Refinement of Buildings' Segmentation Models using SAM

Ali Mayladan,Hasan Nasrallah,Hasan Moughnieh,Mustafa Shukor,Ali J. Ghandour
2024-02-11
Abstract:Foundation models have excelled in various tasks but are often evaluated on general benchmarks. The adaptation of these models for specific domains, such as remote sensing imagery, remains an underexplored area. In remote sensing, precise building instance segmentation is vital for applications like urban planning. While Convolutional Neural Networks (CNNs) perform well, their generalization can be limited. For this aim, we present a novel approach to adapt foundation models to address existing models' generalization dropback. Among several models, our focus centers on the Segment Anything Model (SAM), a potent foundation model renowned for its prowess in class-agnostic image segmentation capabilities. We start by identifying the limitations of SAM, revealing its suboptimal performance when applied to remote sensing imagery. Moreover, SAM does not offer recognition abilities and thus fails to classify and tag localized objects. To address these limitations, we introduce different prompting strategies, including integrating a pre-trained CNN as a prompt generator. This novel approach augments SAM with recognition abilities, a first of its kind. We evaluated our method on three remote sensing datasets, including the WHU Buildings dataset, the Massachusetts Buildings dataset, and the AICrowd Mapping Challenge. For out-of-distribution performance on the WHU dataset, we achieve a 5.47\% increase in IoU and a 4.81\% improvement in F1-score. For in-distribution performance on the WHU dataset, we observe a 2.72\% and 1.58\% increase in True-Positive-IoU and True-Positive-F1 score, respectively. Our code is publicly available at this Repo (<a class="link-external link-https" href="https://github.com/geoaigroup/GEOAI-ECRS2023" rel="external noopener nofollow">this https URL</a>), hoping to inspire further exploration of foundation models for domain-specific tasks within the remote sensing community.
Computer Vision and Pattern Recognition,Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to effectively utilize foundation models, especially Meta's Segment Anything Model (SAM), to improve the performance of building instance segmentation in remote sensing images. Specifically, the paper focuses on: 1. **Insufficient generalization ability of existing models**: Current state - of - the - art remote sensing models (such as CNN - based models) have insufficient generalization ability when dealing with images in different regions, seasons and periods due to large image variations. 2. **Limitations of the SAM model**: Although SAM performs excellently in image segmentation, it lacks recognition ability and cannot classify and label the segmented objects, especially its performance on remote sensing images is not as expected. To solve these problems, the author proposes a novel method to enhance the ability of the SAM model through prompt engineering, enabling it to better adapt to the building instance segmentation task in remote sensing images. Specific methods include: - **Integrating pre - trained CNN as a prompt generator**: Use a pre - trained CNN model to generate building segmentation masks and convert these masks into different prompt types (such as single - point, multi - point, bounding box, etc.) to guide SAM for more accurate segmentation. - **Exploring multiple prompt strategies**: Experiment with multiple different prompt strategies, including single - point prompts, multi - point prompts (random distribution and skeleton form), and bounding box prompts, to find the optimal combination. Through this method, the author hopes to significantly improve the accuracy and robustness of SAM in building instance segmentation in remote sensing images without retraining it. Experimental results show that on the WHU Buildings dataset, this method has achieved significant performance improvement, especially in metrics such as IoU and F1 - score.