Abstract:In the field of medical microscopic image classification (MIC), CNN-based and Transformer-based models have been extensively studied. However, CNNs struggle with modeling long-range dependencies, limiting their ability to fully utilize semantic information in images. Conversely, Transformers are hampered by the complexity of quadratic computations. To address these challenges, we propose a model based on the Mamba architecture: Microscopic-Mamba. Specifically, we designed the Partially Selected Feed-Forward Network (PSFFN) to replace the last linear layer of the Visual State Space Module (VSSM), enhancing Mamba's local feature extraction capabilities. Additionally, we introduced the Modulation Interaction Feature Aggregation (MIFA) module to effectively modulate and dynamically aggregate global and local features. We also incorporated a parallel VSSM mechanism to improve inter-channel information interaction while reducing the number of parameters. Extensive experiments have demonstrated that our method achieves state-of-the-art performance on five public datasets. Code is available at <a class="link-external link-https" href="https://github.com/zs1314/Microscopic-Mamba" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the limitations of existing models in medical microscopic image classification (MIC). Specifically: 1. **Limitations of Convolutional Neural Networks (CNNs)**: - CNNs have difficulty in modeling long - range dependencies, which restricts their full utilization of image semantic information. - The local receptive fields of CNNs make it difficult for them to capture long - distance information. 2. **Limitations of Transformer Models**: - Although Transformers are good at global modeling, the time complexity of their self - attention mechanism is quadratic, resulting in an excessively high computational burden, especially when dealing with long sequences. - Such high computational complexity is an important issue in actual medical environments because these environments usually have strict computational resource limitations. 3. **Limitations of Methods Combining CNN and Transformer**: - Some studies have attempted to combine CNN and Transformer to reduce computational complexity, but this is often at the cost of sacrificing the ability of Transformer to capture global information. To address these challenges, the author proposes a new model based on the Mamba architecture: Microscopic - Mamba. This model aims to effectively capture global and local features while maintaining linear complexity. Specific improvements include: - **Partial Selection Feed - Forward Network (PSFFN)**: It is used to replace the last linear layer in the Visual State Space Module (VSSM) to enhance the local feature extraction ability. - **Modulation Interaction Feature Aggregation Module (MIFA)**: It effectively modulates and dynamically aggregates global and local features. - **Parallel VSSM Mechanism**: It improves the information interaction between channels while reducing the number of parameters. Through these improvements, the experimental results of Microscopic - Mamba on five public datasets show that it not only outperforms the existing state - of - the - art methods in performance, but also has fewer parameters and lower computational complexity.

Microscopic-Mamba: Revealing the Secrets of Microscopic Images with Just 4M Parameters

Medical Image Classification with a Hybrid SSM Model Based on CNN and Transformer

MMViT-Seg: A Lightweight Transformer and CNN Fusion Network for COVID-19 Segmentation.

HMT-UNet: A hybird Mamba-Transformer Vision UNet for Medical Image Segmentation

MedMamba: Vision Mamba for Medical Image Classification

MambaClinix: Hierarchical Gated Convolution and Mamba-Based U-Net for Enhanced 3D Medical Image Segmentation

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation

VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation

Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation

Why mamba is effective? Exploit Linear Transformer-Mamba Network for Multi-Modality Image Fusion

Microscopic Hyperspectral Image Classification Based on Fusion Transformer with Parallel CNN

Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining

LPAM: A lightweight medical segmentation network based on Mamba improved by prompt attention

VMamba: Visual State Space Model

MambaVision: A Hybrid Mamba-Transformer Vision Backbone