PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing Imagery

Libo Wang,Dongxu Li,Sijun Dong,Xiaoliang Meng,Xiaokang Zhang,Danfeng Hong

2024-06-16

Abstract:Semantic segmentation, as a basic tool for intelligent interpretation of remote sensing images, plays a vital role in many Earth Observation (EO) applications. Nowadays, accurate semantic segmentation of remote sensing images remains a challenge due to the complex spatial-temporal scenes and multi-scale geo-objects. Driven by the wave of deep learning (DL), CNN- and Transformer-based semantic segmentation methods have been explored widely, and these two architectures both revealed the importance of multi-scale feature representation for strengthening semantic information of geo-objects. However, the actual multi-scale feature fusion often comes with the semantic redundancy issue due to homogeneous semantic contents in pyramid features. To handle this issue, we propose a novel Mamba-based segmentation network, namely PyramidMamba. Specifically, we design a plug-and-play decoder, which develops a dense spatial pyramid pooling (DSPP) to encode rich multi-scale semantic features and a pyramid fusion Mamba (PFM) to reduce semantic redundancy in multi-scale feature fusion. Comprehensive ablation experiments illustrate the effectiveness and superiority of the proposed method in enhancing multi-scale feature representation as well as the great potential for real-time semantic segmentation. Moreover, our PyramidMamba yields state-of-the-art performance on three publicly available datasets, i.e. the OpenEarthMap (70.8% mIoU), ISPRS Vaihingen (84.8% mIoU) and Potsdam (88.0% mIoU) datasets. The code will be available at <a class="link-external link-https" href="https://github.com/WangLibo1995/GeoSeg" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper attempts to address the problem of how to effectively aggregate multi-scale features to improve segmentation accuracy in remote sensing image semantic segmentation while reducing semantic redundancy in feature fusion. Specifically, the paper points out: 1. **Challenges of Multi-Scale Feature Representation**: Existing methods based on Convolutional Neural Networks (CNN) and Transformers have limitations in handling multi-scale features. CNN methods usually result in coarse segmentation due to a single and limited receptive field; while Transformers, although capable of capturing global contextual information, have high computational complexity and low efficiency. 2. **Redundancy Problem in Multi-Scale Feature Fusion**: During the multi-scale feature fusion process, a large amount of homogeneous semantic information in the pyramid features leads to poor feature fusion effects, affecting the final segmentation performance. To address these issues, the paper proposes a novel segmentation network based on the Mamba architecture—PyramidMamba. This network effectively enhances multi-scale feature representation and reduces redundancy in feature fusion by designing a Dense Spatial Pyramid Pooling (DSPP) module and a Pyramid Fusion Mamba (PFM) module. The specific contributions include: 1. **Rethinking Pyramid Feature Fusion Scheme**: Proposing a Mamba-based segmentation network (PyramidMamba) to improve multi-scale feature representation. 2. **Designing a Mamba-Based Decoder**: Applying dense spatial pooling to generate more fine-grained multi-scale context and utilizing Mamba's selective characteristics to effectively reduce homogeneous semantic information, while also demonstrating potential in building real-time semantic segmentation networks. 3. **Experimental Validation**: Conducting comprehensive experiments on three widely used remote sensing image semantic segmentation datasets, showing that PyramidMamba achieves accuracy levels comparable to existing state-of-the-art methods.

PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing Imagery

PPMamba: A Pyramid Pooling Local Auxiliary SSM-Based Model for Remote Sensing Image Semantic Segmentation

MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images

UNetMamba: An Efficient UNet-Like Mamba for Semantic Segmentation of High-Resolution Remote Sensing Images

ESMS-Net: Enhancing Semantic-Mask Segmentation Network with Pyramid Atrousformer for Remote Sensing Image

RS3Mamba: Visual State Space Model for Remote Sensing Image Semantic Segmentation

A Mamba-Diffusion Framework for Multimodal Remote Sensing Image Semantic Segmentation

Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion

FCPFNet: Feature Complementation Network with Pyramid Fusion for Semantic Segmentation

RS3Mamba: Visual State Space Model for Remote Sensing Images Semantic Segmentation

Semantic Segmentation for High Spatial Resolution Remote Sensing Images Based on Convolution Neural Network and Pyramid Pooling Module.

Semantic Segmentation Based on Spatial Pyramid Pooling and Multilayer Feature Fusion

Enhanced Feature Pyramid Network for Semantic Segmentation.

Pyramid Architecture for Multi-Scale Processing in Point Cloud Segmentation

Pyramid Fusion Transformer for Semantic Segmentation

Semantic Segmentation for Remote Sensing Images Using Pyramid Object-Based Markov Random Field With Dual-Track Information Transmission

Dense Pyramid Network for Semantic Segmentation of High Resolution Aerial Imagery.

Few-Shot Aerial Image Semantic Segmentation Leveraging Pyramid Correlation Fusion

Spatial Structure Preserving Feature Pyramid Network for Semantic Image Segmentation

Deep Sensor Fusion with Pyramid Fusion Networks for 3D Semantic Segmentation

Dense Feature Pyramid Fusion Deep Network for Building Segmentation in Remote Sensing Image