Multi-view Remote Sensing Image Segmentation With SAM priors

Zipeng Qi,Chenyang Liu,Zili Liu,Hao Chen,Yongchang Wu,Zhengxia Zou,Zhenwei Sh
2024-05-23
Abstract:Multi-view segmentation in Remote Sensing (RS) seeks to segment images from diverse perspectives within a scene. Recent methods leverage 3D information extracted from an Implicit Neural Field (INF), bolstering result consistency across multiple views while using limited accounts of labels (even within 3-5 labels) to streamline labor. Nonetheless, achieving superior performance within the constraints of limited-view labels remains challenging due to inadequate scene-wide supervision and insufficient semantic features within the INF. To address these. we propose to inject the prior of the visual foundation model-Segment Anything(SAM), to the INF to obtain better results under the limited number of training data. Specifically, we contrast SAM features between testing and training views to derive pseudo labels for each testing view, augmenting scene-wide labeling information. Subsequently, we introduce SAM features via a transformer into the INF of the scene, supplementing the semantic information. The experimental results demonstrate that our method outperforms the mainstream method, confirming the efficacy of SAM as a supplement to the INF for this task.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to achieve higher - quality image segmentation with limited labeled data in the multi - view remote sensing image segmentation task. Specifically, although existing methods utilize 3D information extracted from implicit neural fields (INF) to enhance the consistency of multi - view results, under the constraint of limited - view labels, it is difficult to achieve excellent performance due to insufficient supervision within the scene range and insufficient semantic features in INF. To solve these problems, the author proposes a new method. By introducing the prior knowledge of a large - scale visual foundation model - Segment Anything Model (SAM) into INF, better segmentation results can be obtained with a limited amount of training data. This method includes two stages: first, construct the INF of the scene, and then integrate SAM features into the INF through the Transformer mechanism to supplement semantic information, and generate pseudo - labels by comparing SAM features between the test view and the training view, thereby enhancing the labeling information within the scene range. Experimental results show that this method outperforms mainstream methods in multi - view segmentation tasks, verifying the effectiveness of SAM as a supplement to INF.