Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion

Yice Cao,Chenchen Liu,Zhenhua Wu,Wenxin Yao,Liu Xiong,Jie Chen,Zhixiang Huang

2024-10-08

Abstract:As remote sensing imaging technology continues to advance and evolve, processing high-resolution and diversified satellite imagery to improve segmentation accuracy and enhance interpretation efficiency emerg as a pivotal area of investigation within the realm of remote sensing. Although segmentation algorithms based on CNNs and Transformers achieve significant progress in performance, balancing segmentation accuracy and computational complexity remains challenging, limiting their wide application in practical tasks. To address this, this paper introduces state space model (SSM) and proposes a novel hybrid semantic segmentation network based on vision Mamba (CVMH-UNet). This method designs a cross-scanning visual state space block (CVSSBlock) that uses cross 2D scanning (CS2D) to fully capture global information from multiple directions, while by incorporating convolutional neural network branches to overcome the constraints of Vision Mamba (VMamba) in acquiring local information, this approach facilitates a comprehensive analysis of both global and local features. Furthermore, to address the issue of limited discriminative power and the difficulty in achieving detailed fusion with direct skip connections, a multi-frequency multi-scale feature fusion block (MFMSBlock) is designed. This module introduces multi-frequency information through 2D discrete cosine transform (2D DCT) to enhance information utilization and provides additional scale local detail information through point-wise convolution branches. Finally, it aggregates multi-scale information along the channel dimension, achieving refined feature fusion. Findings from experiments conducted on renowned datasets of remote sensing imagery demonstrate that proposed CVMH-UNet achieves superior segmentation performance while maintaining low computational complexity, outperforming surpassing current leading-edge segmentation algorithms.

Computer Vision and Pattern Recognition,Machine Learning

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper aims to address the issues of segmentation accuracy and interpretative efficiency in high-resolution and diverse satellite remote sensing image processing. Although segmentation algorithms based on Convolutional Neural Networks (CNN) and Transformers have made significant progress in performance, they still face challenges in balancing segmentation accuracy and computational complexity, which limits their widespread application in practical tasks. Specifically, the paper proposes a novel hybrid semantic segmentation network (CVMH-UNet) based on Vision Mamba and multi-scale multi-frequency feature fusion. This method designs a Cross-Scan Visual State Space Block (CVSSBlock), which comprehensively captures global information from multiple directions through Cross 2D Scanning (CS2D) and overcomes the limitations of Vision Mamba in acquiring local information by introducing a CNN branch, thereby achieving a comprehensive analysis of global and local features. Furthermore, to address the limited discriminative ability and difficulty in achieving detailed fusion in direct skip connections, the paper also designs a Multi-Frequency Multi-Scale Feature Fusion Block (MFMSBlock). This module introduces multi-frequency information through 2D Discrete Cosine Transform (2D DCT) to enhance information utilization and provides additional local detail information through a pointwise convolution branch. Ultimately, it aggregates multi-scale information along the channel dimension to achieve fine feature fusion. Experimental results show that the proposed CVMH-UNet achieves superior segmentation performance while maintaining low computational complexity, surpassing current leading segmentation algorithms.

Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion

MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images

CM-UNet: Hybrid CNN-Mamba UNet for Remote Sensing Image Semantic Segmentation

RS3Mamba: Visual State Space Model for Remote Sensing Images Semantic Segmentation

RS3Mamba: Visual State Space Model for Remote Sensing Image Semantic Segmentation

MSVM-UNet: Multi-Scale Vision Mamba UNet for Medical Image Segmentation

Link Aggregation for Skip Connection–Mamba: Remote Sensing Image Segmentation Network Based on Link Aggregation Mamba

MSFMamba: Multi-Scale Feature Fusion State Space Model for Multi-Source Remote Sensing Image Classification

Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

A ViT-Based Multiscale Feature Fusion Approach for Remote Sensing Image Segmentation

Remote Sensing Image Semantic Segmentation Network Based on Multi-Scale Feature Enhancement Fusion

A Remote Sensing Image Segmentation Model Based on Multi-Scale Feature Fusion

Semantic Segmentation Of Remote Sensing Images Based On Multi-Model Fusion

PPMamba: A Pyramid Pooling Local Auxiliary SSM-Based Model for Remote Sensing Image Semantic Segmentation

FusionMamba: Efficient Remote Sensing Image Fusion with State Space Model

VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation

MCAFNet: A Multiscale Channel Attention Fusion Network for Semantic Segmentation of Remote Sensing Images

Remote Sensing Image Semantic Segmentation Method Based on a Deep Convolutional Neural Network and Multiscale Feature Fusion

Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model

Multi-scale Spatial Aggregation Network for Remote Sensing Image Segmentation

Encoder- and Decoder-Based Networks Using Multiscale Feature Fusion and Nonlocal Block for Remote Sensing Image Semantic Segmentation