Abstract:Recently, deep learning models have found extensive application in high-resolution land-cover segmentation research. However, the most current research still suffers from issues such as insufficient utilization of multi-modal information, which limits further improvement in high-resolution land-cover segmentation accuracy. Moreover, differences in the size and spatial resolution of multi-modal datasets collectively pose challenges to multi-modal land-cover segmentation. Therefore, we propose a high-resolution land-cover segmentation network (STSNet) with cross-spatial resolution spatio-temporal-spectral deep fusion. This network effectively utilizes spatio-temporal-spectral features to achieve information complementary among multi-modal data. Specifically, STSNet consists of four components: (1) A high resolution and multi-scale spatial-spectral encoder to jointly extract subtle spatial-spectral features in hyperspectral and high spatial resolution images. (2) A long-term spatio-temporal encoder formulated by spectral convolution and spatio-temporal transformer block to simultaneously delineates the spatial, temporal and spectral information in dense time series Sentinel-2 imagery. (3) A cross-resolution fusion module to alleviate the spatial resolution differences between multi-modal data and effectively leverages complementary spatio-temporal-spectral information. (4) A multi-scale decoder integrates multi-scale information from multi-modal data. We utilized airborne hyperspectral remote sensing imagery from the Shenyang region of China in 2020, with a spatial resolution of 1authors declare that they have no known competm, a spectral number of 249, and a spectral resolution <= 5 nm, and its Sentinel dense time-series images acquired in the same period with a spatial resolution of 10 m, a spectral number of 10, and a time-series number of 31. These datasets were combined to generate a multi-modal dataset called WHU-(HSR)-S-2-MT, which is the first open accessed large-scale high spatio-temporal-spectral satellite remote sensing dataset (i.e., with >2500 image pairs sized 300 m x 300 m for each). Additionally, we employed two open-source datasets to validate the effectiveness of the proposed modules. Extensive experiments show that our multi-scale spatial-spectral encoder, spatio-temporal encoder, and cross-resolution fusion module outperform existing state-of-the-art (SOTA) algorithms in terms of overall performance on high-resolution land-cover segmentation. The new multi-modal dataset will be made available at http://irsip.whu.edu.cn/resources/resources_en_v2.php, along with the corresponding code for accessing and utilizing the dataset at https://github.com/RS-Mage/STSNet.

STSNet: A Cross-Spatial Resolution Multi-Modal Remote Sensing Deep Fusion Network for High Resolution Land-Cover Segmentation

Generating 10-Meter Resolution Land Use and Land Cover Products Using Historical Landsat Archive Based on Super Resolution Guided Semantic Segmentation Network

Emmcnn: An Etps-Based Multi-Scale And Multi-Feature Method Using Cnn For High Spatial Resolution Image Land-Cover Classification

PSSTFN: A Progressive Spatial–Temporal–Spectral Fusion Network for Remote Sensing Images

Semi-Supervised Adversarial Semantic Segmentation Network Using Transformer and Multiscale Convolution for High-Resolution Remote Sensing Imagery

Multi-Attention-Based Semantic Segmentation Network for Land Cover Remote Sensing Images

Deep Learning-Based Spatiotemporal Fusion Architecture of Landsat 8 and Sentinel-2 Data for 10 m Series Imagery

A Transformer-based Multi-Modal Fusion Network for Semantic Segmentation of High-Resolution Remote Sensing Imagery

Spatial-Convolution Spectral-Transformer Interactive Network for Large-Scale Fast Refined Land Cover Classification and Mapping Based on ZY1-02D Satellite Hyperspectral Imagery

SEG-ESRGAN: A Multi-Task Network for Super-Resolution and Semantic Segmentation of Remote Sensing Images

Multi-scale attention fusion network for semantic segmentation of remote sensing images

An Attention-Fused Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

Integrating Detailed Features and Global Contexts for Semantic Segmentation in Ultra-High-Resolution Remote Sensing Images

Dynamic High-Resolution Network for Semantic Segmentation in Remote-Sensing Images

Multistage Progressive Interactive Fusion Network for Sentinel-2: High Resolution for All Bands

Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

A Stage-Adaptive Selective Network with Position Awareness for Semantic Segmentation of LULC Remote Sensing Images

Semantic Segmentation of Land Cover in Urban Areas by Fusing Multisource Satellite Image Time Series

A Multilevel Multimodal Fusion Transformer for Remote Sensing Semantic Segmentation

Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation

SSNet: A Novel Transformer and CNN Hybrid Network for Remote Sensing Semantic Segmentation