MedMamba: Vision Mamba for Medical Image Classification

Yubiao Yue,Zhenzhang Li
2024-09-29
Abstract:Since the era of deep learning, convolutional neural networks (CNNs) and vision transformers (ViTs) have been extensively studied and widely used in medical image classification tasks. Unfortunately, CNN's limitations in modeling long-range dependencies result in poor classification performances. In contrast, ViTs are hampered by the quadratic computational complexity of their self-attention mechanism, making them difficult to deploy in real-world settings with limited computational resources. Recent studies have shown that state space models (SSMs) represented by Mamba can effectively model long-range dependencies while maintaining linear computational complexity. Inspired by it, we proposed MedMamba, the first Vision Mamba for generalized medical image classification. Concretely, we introduced a novel hybrid basic block named SS-Conv-SSM, which purely integrates the convolutional layers for extracting local features with the abilities of SSM to capture long-range dependencies, aiming to model medical images from different image modalities efficiently. By employing the grouped convolution strategy and channel-shuffle operation, MedMamba successfully provides fewer model parameters and a lower computational burden for efficient applications without sacrificing accuracy. We thoroughly evaluated MedMamba using 16 datasets containing ten imaging modalities and 411,007 images. Experimental results show that MedMamba demonstrates competitive performance on most tasks compared with the state-of-the-art methods. This work aims to explore the potential of Vision Mamba and establish a new baseline for medical image classification, thereby providing valuable insights for developing more powerful Mamba-based artificial intelligence algorithms and applications in medicine. The source codes and all pre-trained weights of MedMamba are available at <a class="link-external link-https" href="https://github.com/YubiaoYue/MedMamba" rel="external noopener nofollow">this https URL</a>.
Image and Video Processing,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem this paper attempts to address is the limitations of existing deep learning models in medical image classification tasks. Specifically: 1. **Convolutional Neural Networks (CNNs)**: While they can effectively extract local features, they are insufficient in modeling long-range dependencies, leading to suboptimal classification performance. 2. **Vision Transformers (ViTs)**: Although they can effectively capture long-range dependencies, the quadratic computational complexity of the self-attention mechanism makes them difficult to deploy in practical applications, especially in resource-constrained environments. To overcome these limitations, the authors propose **MedMamba**, the first general medical image classification model based on state space models (SSMs). MedMamba introduces a new hybrid basic block **SS-Conv-SSM**, which combines the convolutional layer for local feature extraction with the long-range dependency modeling capability of SSMs, aiming to efficiently handle medical images of different imaging modalities. Additionally, by adopting a group convolution strategy and channel shuffle operations, MedMamba provides fewer model parameters and lower computational burden without sacrificing accuracy, thus enabling efficient medical AI applications. In summary, the goal of this paper is to explore the potential of Vision Mamba, establish a new benchmark for medical image classification, and provide valuable insights for developing more powerful Mamba-based AI algorithms and applications.