PPMamba: A Pyramid Pooling Local Auxiliary SSM-Based Model for Remote Sensing Image Semantic Segmentation

Yin Hu,Xianping Ma,Jialu Sui,Man-On Pun
2024-09-10
Abstract:Semantic segmentation is a vital task in the field of remote sensing (RS). However, conventional convolutional neural network (CNN) and transformer-based models face limitations in capturing long-range dependencies or are often computationally intensive. Recently, an advanced state space model (SSM), namely Mamba, was introduced, offering linear computational complexity while effectively establishing long-distance dependencies. Despite their advantages, Mamba-based methods encounter challenges in preserving local semantic information. To cope with these challenges, this paper proposes a novel network called Pyramid Pooling Mamba (PPMamba), which integrates CNN and Mamba for RS semantic segmentation tasks. The core structure of PPMamba, the Pyramid Pooling-State Space Model (PP-SSM) block, combines a local auxiliary mechanism with an omnidirectional state space model (OSS) that selectively scans feature maps from eight directions, capturing comprehensive feature information. Additionally, the auxiliary mechanism includes pyramid-shaped convolutional branches designed to extract features at multiple scales. Extensive experiments on two widely-used datasets, ISPRS Vaihingen and LoveDA Urban, demonstrate that PPMamba achieves competitive performance compared to state-of-the-art models.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address several key issues in semantic segmentation of remote sensing images: 1. **Limitations of Existing Methods**: - Convolutional Neural Networks (CNNs), while adept at capturing local information, are limited in handling global context due to their restricted receptive field. - Transformers, although effective in modeling long-range dependencies, have high computational complexity when dealing with high-resolution, large-scale remote sensing data. 2. **Advantages and Challenges of the Mamba Architecture**: - The Mamba architecture can effectively capture long-range dependencies while maintaining linear computational complexity, but existing Mamba-based methods fall short in preserving local details. 3. **Proposed New Method**: - A new network structure, Pyramid Pooling Mamba (PPMamba), is proposed, combining the strengths of CNNs and Mamba to capture both local and global features simultaneously. - The core structure, the PP-SSM block, integrates pyramid-shaped convolution branches and the Omnidirectional State Space Model (OSS), enabling feature extraction at multiple scales and effectively capturing global dependencies in different directions. Experiments on two widely used datasets, ISPRS Vaihingen and LoveDA Urban, demonstrate that PPMamba excels in semantic segmentation tasks, significantly outperforming various existing state-of-the-art models. This indicates that PPMamba has the potential to address the unique challenges of semantic segmentation in remote sensing images.