Open-Vocabulary Remote Sensing Image Semantic Segmentation

Qinglong Cao,Yuntian Chen,Chao Ma,Xiaokang Yang
2024-09-12
Abstract:Open-vocabulary image semantic segmentation (OVS) seeks to segment images into semantic regions across an open set of categories. Existing OVS methods commonly depend on foundational vision-language models and utilize similarity computation to tackle OVS tasks. However, these approaches are predominantly tailored to natural images and struggle with the unique characteristics of remote sensing images, such as rapidly changing orientations and significant scale variations. These challenges complicate OVS tasks in earth vision, requiring specialized approaches. To tackle this dilemma, we propose the first OVS framework specifically designed for remote sensing imagery, drawing inspiration from the distinct remote sensing traits. Particularly, to address the varying orientations, we introduce a rotation-aggregative similarity computation module that generates orientation-adaptive similarity maps as initial semantic maps. These maps are subsequently refined at both spatial and categorical levels to produce more accurate semantic maps. Additionally, to manage significant scale changes, we integrate multi-scale image features into the upsampling process, resulting in the final scale-aware semantic masks. To advance OVS in earth vision and encourage reproducible research, we establish the first open-sourced OVS benchmark for remote sensing imagery, including four public remote sensing datasets. Extensive experiments on this benchmark demonstrate our proposed method achieves state-of-the-art performance. All codes and datasets are available at <a class="link-external link-https" href="https://github.com/caoql98/OVRS" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve two main challenges in open - vocabulary semantic segmentation (OVS) of remote - sensing images: rapidly changing orientations and significant scale variations. Existing OVS methods usually rely on basic vision - language models and use similarity calculations to handle OVS tasks. However, these methods are mainly designed for natural images and have difficulty dealing with the unique characteristics of remote - sensing images, such as rapidly changing orientations and significant scale variations. These problems seriously affect the performance of OVS tasks in earth vision and require specialized methods to solve. Specifically, the paper proposes an OVS framework specifically designed for remote - sensing images to address the following problems: 1. **Rapidly changing orientations**: - The orientations of objects in remote - sensing images are changeable, which poses a challenge to traditional segmentation methods. - The paper introduces a rotation - aggregated similarity calculation module, which generates an orientation - adaptive similarity map as an initial semantic map, thereby reducing the impact of orientation changes. 2. **Significant scale variations**: - The scale differences of objects in remote - sensing images are significant, which poses a challenge to accurate segmentation. - The paper generates a scale - aware semantic map by integrating multi - scale image features during the up - sampling process, thereby improving the segmentation performance. ### Main contributions 1. **Propose the first open - vocabulary semantic segmentation framework specifically for remote - sensing images**: - A new remote - sensing OVS benchmark is established, promoting the research progress in the field of earth vision. 2. **Introduce a rotation - aggregated similarity calculation module**: - By synthesizing an orientation - adaptive similarity map, an initial semantic map is generated, effectively dealing with the problem of rapidly changing orientations. 3. **Generate a scale - aware semantic map through multi - scale feature integration**: - Multi - scale features are gradually integrated during the up - sampling process to generate the final scale - aware semantic map, significantly improving the segmentation performance. 4. **Extensive experimental verification**: - A large number of experiments on the newly proposed remote - sensing OVS benchmark show that this method is significantly superior to existing natural - image - based OVS methods and can effectively handle the unique characteristics of remote - sensing images. Through these innovations, the paper provides a new solution for open - vocabulary semantic segmentation of remote - sensing images and promotes the technological progress in this field.