SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images

Kaiyu Li,Ruixun Liu,Xiangyong Cao,Xueru Bai,Feng Zhou,Deyu Meng,Zhi Wang

2024-11-04

Abstract:Remote sensing image plays an irreplaceable role in fields such as agriculture, water resources, military, and disaster relief. Pixel-level interpretation is a critical aspect of remote sensing image applications; however, a prevalent limitation remains the need for extensive manual annotation. For this, we try to introduce open-vocabulary semantic segmentation (OVSS) into the remote sensing context. However, due to the sensitivity of remote sensing images to low-resolution features, distorted target shapes and ill-fitting boundaries are exhibited in the prediction mask. To tackle this issue, we propose a simple and general upsampler, SimFeatUp, to restore lost spatial information in deep features in a training-free style. Further, based on the observation of the abnormal response of local patch tokens to [CLS] token in CLIP, we propose to execute a straightforward subtraction operation to alleviate the global bias in patch tokens. Extensive experiments are conducted on 17 remote sensing datasets spanning semantic segmentation, building extraction, road detection, and flood detection tasks. Our method achieves an average of 5.8%, 8.2%, 4.0%, and 15.3% improvement over state-of-the-art methods on 4 tasks. All codes are released. \url{<a class="link-external link-https" href="https://earth-insights.github.io/SegEarth-OV" rel="external noopener nofollow">this https URL</a>}

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to achieve open - vocabulary semantic segmentation (OVSS) in remote - sensing images without training. Specifically, the authors focus on how to improve the pixel - level interpretability of remote - sensing images without a large amount of manual annotation. Traditional methods usually require a large amount of manually - annotated data to train models, which is a huge challenge in remote - sensing image processing because the cost of obtaining large - scale labels is very high. In addition, remote - sensing images have problems such as being sensitive to low - resolution features, distorted target shapes, and boundaries not being suitable for prediction masks, which limit the performance of existing methods on remote - sensing images. To solve the above problems, the authors propose a method named SegEarth - OV, which contains two main innovations: 1. **SimFeatUp**: This is a simple and general feature up - sampler, aiming to recover the spatial information in deep features in an unsupervised manner. By training on a small number of unlabeled images, SimFeatUp can upsample any remote - sensing image features, thus maintaining semantic consistency with the image content. 2. **Global Bias Mitigation**: The authors observe that in the CLIP model, local patch features are affected by global features, resulting in biased prediction results. For this reason, they propose a simple subtraction operation to reduce this bias by subtracting the global features from the local features. Through these two innovations, SegEarth - OV has carried out extensive experiments on 17 remote - sensing datasets, covering tasks such as semantic segmentation, building extraction, road detection, and flood detection. The experimental results show that SegEarth - OV significantly outperforms the existing state - of - the - art methods in multiple tasks, especially in single - class extraction tasks. In conclusion, the main contribution of this paper is to provide a training - free framework that can achieve high - quality open - vocabulary semantic segmentation in remote - sensing images, thereby reducing the dependence on large - scale annotated data and improving the segmentation accuracy.

SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images

Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation

Efficient Patch-Wise Semantic Segmentation for Large-Scale Remote Sensing Images

Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models

Open-Vocabulary Remote Sensing Image Semantic Segmentation

MetaSegNet: Metadata-Collaborative Vision-Language Representation Learning for Semantic Segmentation of Remote Sensing Images

Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

Semantic Segmentation for Multisource Remote Sensing Images Incorporating Feature Slice Reconstruction and Attention Upsampling

A Creative Weak Supervised Semantic Segmentation for Remote Sensing Images

A deep learning based framework for remote sensing image ground object segmentation

Segmentation of VHR EO Images using Unsupervised Learning

Enhanced semantic-positional feature fusion network via diverse pre-trained encoders for remote sensing image water-body segmentation

Advancing high-resolution remote sensing: a compact and powerful approach to semantic segmentation

SegCLIP: Multimodal Visual-Language and Prompt Learning for High-Resolution Remote Sensing Semantic Segmentation

EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation

RSAM-Seg: A SAM-based Approach with Prior Knowledge Integration for Remote Sensing Image Semantic Segmentation

Enhancing Multiscale Representations with Transformer for Remote Sensing Image Semantic Segmentation

Learning Open-vocabulary Semantic Segmentation Models from Natural Language Supervision.

Semi-Supervised Adversarial Semantic Segmentation Network Using Transformer and Multiscale Convolution for High-Resolution Remote Sensing Imagery

OpenSD: Unified Open-Vocabulary Segmentation and Detection