RSMamba: Remote Sensing Image Classification with State Space Model

Keyan Chen,Bowen Chen,Chenyang Liu,Wenyuan Li,Zhengxia Zou,Zhenwei Shi
2024-03-29
Abstract:Remote sensing image classification forms the foundation of various understanding tasks, serving a crucial function in remote sensing image interpretation. The recent advancements of Convolutional Neural Networks (CNNs) and Transformers have markedly enhanced classification accuracy. Nonetheless, remote sensing scene classification remains a significant challenge, especially given the complexity and diversity of remote sensing scenarios and the variability of spatiotemporal resolutions. The capacity for whole-image understanding can provide more precise semantic cues for scene discrimination. In this paper, we introduce RSMamba, a novel architecture for remote sensing image classification. RSMamba is based on the State Space Model (SSM) and incorporates an efficient, hardware-aware design known as the Mamba. It integrates the advantages of both a global receptive field and linear modeling complexity. To overcome the limitation of the vanilla Mamba, which can only model causal sequences and is not adaptable to two-dimensional image data, we propose a dynamic multi-path activation mechanism to augment Mamba's capacity to model non-causal data. Notably, RSMamba maintains the inherent modeling mechanism of the vanilla Mamba, yet exhibits superior performance across multiple remote sensing image classification datasets. This indicates that RSMamba holds significant potential to function as the backbone of future visual foundation models. The code will be available at \url{
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper is primarily dedicated to addressing the issue of high-resolution remote sensing image classification. Specifically, the research team proposes a new architecture—RSMamba (Remote Sensing Mamba), which is a novel network structure based on the State Space Model (SSM). This architecture aims to overcome the challenges faced by traditional methods when dealing with complex and diverse remote sensing scenes and varying spatial-temporal resolutions. The core contributions of RSMamba include: 1. **Efficient Global Feature Modeling**: By leveraging the advantages of SSM, RSMamba can effectively capture long-range dependencies within the entire remote sensing image, thereby obtaining rich semantic information, which helps improve classification accuracy. 2. **Dynamic Multi-Path Activation Mechanism**: To address the issue that the original Mamba model can only handle causal sequences and is insensitive to position, the paper introduces a dynamic multi-path activation mechanism. This mechanism enhances the model's ability to process 2D image data by constructing forward, backward, and random paths, enabling the model to better understand non-causal data. 3. **Performance Validation**: The paper conducts comprehensive experiments on three different remote sensing image classification datasets. The results show that RSMamba has significant advantages over methods based on Convolutional Neural Networks (CNNs) and attention mechanisms (such as Transformer). In summary, RSMamba aims to provide a feasible and efficient solution for the intelligent interpretation of large-scale remote sensing images. It demonstrates excellent performance on datasets of different scales and, due to its relatively small number of parameters, can achieve good inductive bias effects without requiring a large amount of training data.