Abstract:Since the era of deep learning, convolutional neural networks (CNNs) and vision transformers (ViTs) have been extensively studied and widely used in medical image classification tasks. Unfortunately, CNN's limitations in modeling long-range dependencies result in poor classification performances. In contrast, ViTs are hampered by the quadratic computational complexity of their self-attention mechanism, making them difficult to deploy in real-world settings with limited computational resources. Recent studies have shown that state space models (SSMs) represented by Mamba can effectively model long-range dependencies while maintaining linear computational complexity. Inspired by it, we proposed MedMamba, the first Vision Mamba for generalized medical image classification. Concretely, we introduced a novel hybrid basic block named SS-Conv-SSM, which purely integrates the convolutional layers for extracting local features with the abilities of SSM to capture long-range dependencies, aiming to model medical images from different image modalities efficiently. By employing the grouped convolution strategy and channel-shuffle operation, MedMamba successfully provides fewer model parameters and a lower computational burden for efficient applications without sacrificing accuracy. We thoroughly evaluated MedMamba using 16 datasets containing ten imaging modalities and 411,007 images. Experimental results show that MedMamba demonstrates competitive performance on most tasks compared with the state-of-the-art methods. This work aims to explore the potential of Vision Mamba and establish a new baseline for medical image classification, thereby providing valuable insights for developing more powerful Mamba-based artificial intelligence algorithms and applications in medicine. The source codes and all pre-trained weights of MedMamba are available at <a class="link-external link-https" href="https://github.com/YubiaoYue/MedMamba" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem this paper attempts to address is the limitations of existing deep learning models in medical image classification tasks. Specifically: 1. **Convolutional Neural Networks (CNNs)**: While they can effectively extract local features, they are insufficient in modeling long-range dependencies, leading to suboptimal classification performance. 2. **Vision Transformers (ViTs)**: Although they can effectively capture long-range dependencies, the quadratic computational complexity of the self-attention mechanism makes them difficult to deploy in practical applications, especially in resource-constrained environments. To overcome these limitations, the authors propose **MedMamba**, the first general medical image classification model based on state space models (SSMs). MedMamba introduces a new hybrid basic block **SS-Conv-SSM**, which combines the convolutional layer for local feature extraction with the long-range dependency modeling capability of SSMs, aiming to efficiently handle medical images of different imaging modalities. Additionally, by adopting a group convolution strategy and channel shuffle operations, MedMamba provides fewer model parameters and lower computational burden without sacrificing accuracy, thus enabling efficient medical AI applications. In summary, the goal of this paper is to explore the potential of Vision Mamba, establish a new benchmark for medical image classification, and provide valuable insights for developing more powerful Mamba-based AI algorithms and applications.

MedMamba: Vision Mamba for Medical Image Classification

MedMamba: Vision Mamba for Medical Image Classification

Vision Mamba for Classification of Breast Ultrasound Images

Medical Image Classification with a Hybrid SSM Model Based on CNN and Transformer

Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation

A Comprehensive Survey of Mamba Architectures for Medical Image Analysis: Classification, Segmentation, Restoration and Beyond

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

HMT-UNet: A hybird Mamba-Transformer Vision UNet for Medical Image Segmentation

HC-Mamba: Vision MAMBA with Hybrid Convolutional Techniques for Medical Image Segmentation

Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining

SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation

A Survey on Visual Mamba

Mamba in Vision: A Comprehensive Survey of Techniques and Applications

Microscopic-Mamba: Revealing the Secrets of Microscopic Images with Just 4M Parameters

MambaMIR: An Arbitrary-Masked Mamba for Joint Medical Image Reconstruction and Uncertainty Estimation

FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba