Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models

Jielu Zhang,Zhongliang Zhou,Gengchen Mai,Mengxuan Hu,Zihan Guan,Sheng Li,Lan Mu

2024-08-25

Abstract:Remote sensing imagery has attracted significant attention in recent years due to its instrumental role in global environmental monitoring, land usage monitoring, and more. As image databases grow each year, performing automatic segmentation with deep learning models has gradually become the standard approach for processing the data. Despite the improved performance of current models, certain limitations remain unresolved. Firstly, training deep learning models for segmentation requires per-pixel annotations. Given the large size of datasets, only a small portion is fully annotated and ready for training. Additionally, the high intra-dataset variance in remote sensing data limits the transfer learning ability of such models. Although recently proposed generic segmentation models like SAM have shown promising results in zero-shot instance-level segmentation, adapting them to semantic segmentation is a non-trivial task. To tackle these challenges, we propose a novel method named Text2Seg for remote sensing semantic segmentation. Text2Seg overcomes the dependency on extensive annotations by employing an automatic prompt generation process using different visual foundation models (VFMs), which are trained to understand semantic information in various ways. This approach not only reduces the need for fully annotated datasets but also enhances the model's ability to generalize across diverse datasets. Evaluations on four widely adopted remote sensing datasets demonstrate that Text2Seg significantly improves zero-shot prediction performance compared to the vanilla SAM model, with relative improvements ranging from 31% to 225%. Our code is available at <a class="link-external link-https" href="https://github.com/Douglas2Code/Text2Seg" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve The paper aims to address several key issues in semantic segmentation of remote sensing images: 1. **High Data Annotation Cost**: Training deep learning models for segmentation requires pixel-level annotations. Due to the large size of datasets, only a small amount of data is fully annotated, which limits the scale of model training. 2. **High Variability Within Datasets**: Remote sensing data exhibits significant differences in terms of sensors, geographic locations, time, etc., which limits the model's transfer learning capability. 3. **Challenges of Zero-Shot Segmentation**: Although recently proposed general segmentation models (such as SAM) perform well in zero-shot instance-level segmentation, applying them to semantic segmentation remains challenging. To address these challenges, the authors propose a new method called Text2Seg for semantic segmentation of remote sensing images. Text2Seg reduces the reliance on large amounts of annotated data and improves the model's generalization ability across different datasets by automatically generating prompts using various Visual Foundation Models (VFMs). Experimental results show that Text2Seg significantly improves zero-shot prediction performance on four widely used remote sensing datasets, with relative improvements ranging from 31% to 225%.

Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models

RSAM-Seg: A SAM-based Approach with Prior Knowledge Integration for Remote Sensing Image Semantic Segmentation

MetaSegNet: Metadata-collaborative Vision-Language Representation Learning for Semantic Segmentation of Remote Sensing Images

Self-guided Few-shot Semantic Segmentation for Remote Sensing Imagery Based on Large Vision Models

SAM-Assisted Remote Sensing Imagery Semantic Segmentation with Object and Boundary Constraints

A Creative Weak Supervised Semantic Segmentation for Remote Sensing Images

SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images

Advancing high-resolution remote sensing: a compact and powerful approach to semantic segmentation

The Segment Anything Model (SAM) for Remote Sensing Applications: From Zero to One Shot

A deep learning based framework for remote sensing image ground object segmentation

Simple and Efficient: A Semisupervised Learning Framework for Remote Sensing Image Semantic Segmentation

MultiSenseSeg: A Cost-Effective Unified Multimodal Semantic Segmentation Model for Remote Sensing

MeSAM: Multiscale Enhanced Segment Anything Model for Optical Remote Sensing Images

EasySeg: An Error-Aware Domain Adaptation Framework for Remote Sensing Imagery Semantic Segmentation via Interactive Learning and Active Learning

SegMind: Semisupervised Remote Sensing Image Semantic Segmentation With Masked Image Modeling and Contrastive Learning Method

SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model

SiamSeg: Self-Training with Contrastive Learning for Unsupervised Domain Adaptation Semantic Segmentation in Remote Sensing

Incorporating DeepLabv3+and Object-Based Image Analysis for Semantic Segmentation of Very High Resolution Remote Sensing Images

Text4Seg: Reimagining Image Segmentation as Text Generation