Abstract:Convolutional Neural Networks (CNNs) and Transformer-based self-attention models have become the standard for medical image segmentation. This paper demonstrates that convolution and self-attention, while widely used, are not the only effective methods for segmentation. Breaking with convention, we present a Convolution and self-Attention-free Mamba-based semantic Segmentation Network named CAMS-Net. Specifically, we design Mamba-based Channel Aggregator and Spatial Aggregator, which are applied independently in each encoder-decoder stage. The Channel Aggregator extracts information across different channels, and the Spatial Aggregator learns features across different spatial locations. We also propose a Linearly Interconnected Factorized Mamba (LIFM) block to reduce the computational complexity of a Mamba block and to enhance its decision function by introducing a non-linearity between two factorized Mamba blocks. Our model outperforms the existing state-of-the-art CNN, self-attention, and Mamba-based methods on CMR and M&Ms-2 Cardiac segmentation datasets, showing how this innovative, convolution, and self-attention-free method can inspire further research beyond CNN and Transformer paradigms, achieving linear complexity and reducing the number of parameters. Source code and pre-trained models are available at: <a class="link-external link-https" href="https://github.com/kabbas570/CAMS-Net" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in the cardiac image segmentation task, although existing convolutional neural networks (CNNs) and Transformer models based on the self - attention mechanism are widely used, they are not the only effective segmentation methods. Specifically: 1. **Limitations of CNNs**: Although CNNs perform well in local feature extraction, they have a limited receptive field, making it difficult to effectively capture long - distance dependencies, and tend to recognize textures rather than shapes. 2. **Limitations of the self - attention mechanism**: Although the self - attention mechanism can capture global information and long - distance dependencies, its computational complexity is quadratic, resulting in high computational costs and large memory requirements. To overcome these limitations, the paper proposes a new cardiac image segmentation network without convolution and self - attention mechanisms - CAMS - Net (Convolution and Attention - Free Mamba - based Cardiac Image Segmentation Network). By introducing Mamba blocks and their variants, this network achieves linear computational complexity while maintaining the ability to model global receptive fields and long - distance dependencies. ### Main contributions 1. **Proposing CAMS - Net**: This is the first Mamba - based cardiac image segmentation network that does not use convolution and self - attention mechanisms at all. 2. **Linearly Interconnected Factorized Mamba (LIFM) block**: By factorizing the Mamba block and introducing non - linearity, the number of parameters is reduced and the non - linear ability of the model is improved. 3. **Mamba Channel Aggregator (MCA) and Mamba Spatial Aggregator (MSA)**: They are used to extract information in the channel and spatial dimensions respectively. 4. **Extensive experimental verification**: Through experiments on the CMR and M&Ms - 2 datasets, it is proved that CAMS - Net is superior to existing CNN, self - attention mechanism and hybrid architecture methods in performance. ### Method overview - **Input processing**: The input image is converted into non - overlapping 2x2 patches and projected into a 64 - dimensional feature space through a linear embedding layer. In addition, position encoding is added to preserve spatial context information. - **Encoder - decoder structure**: In each encoder stage, features are down - sampled through a 2x2 average pooling layer. In the bottleneck layer and decoder stage, the CS - IF module is used to fuse channel and spatial information. - **Decoder**: In each decoder stage, features are up - sampled by bilinear interpolation and further processed by the CS - IF module and MCA module. - **Final output**: Five - class segmentation maps (left atrium, right atrium, left ventricle, right ventricle and background) are generated and classified by the Softmax activation function. ### Experimental results - **CMR dataset**: CAMS - Net outperforms existing methods in multiple metrics, especially in Dice Score and Hausdorff Distance. - **M&Ms - 2 dataset**: CAMS - Net also performs well in multi - center, multi - view, multi - disease clinical scenarios, especially in the RV segmentation task. ### Conclusion By proposing CAMS - Net, the paper shows the potential of convolution - and self - attention - mechanism - free methods in cardiac image segmentation. It not only outperforms existing methods in performance, but also has significant advantages in computational efficiency and the number of parameters. This provides a new direction for future research and promotes the development of medical image segmentation technology.

CAMS: Convolution and Attention-Free Mamba-based Cardiac Image Segmentation

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

CardSegNet: An adaptive hybrid CNN-vision transformer model for heart region segmentation in cardiac MRI

MedSegMamba: 3D CNN-Mamba Hybrid Architecture for Brain Segmentation

LPAM: A lightweight medical segmentation network based on Mamba improved by prompt attention

MSMHSA-DeepLab V3+: An Effective Multi-Scale, Multi-Head Self-Attention Network for Dual-Modality Cardiac Medical Image Segmentation

HMT-UNet: A hybird Mamba-Transformer Vision UNet for Medical Image Segmentation

Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation

MambaClinix: Hierarchical Gated Convolution and Mamba-Based U-Net for Enhanced 3D Medical Image Segmentation

MSA$^2$Net: Multi-scale Adaptive Attention-guided Network for Medical Image Segmentation

CapNet: An Automatic Attention-Based with Mixer Model for Cardiovascular Magnetic Resonance Image Segmentation

EM-Net: Efficient Channel and Frequency Learning with Mamba for 3D Medical Image Segmentation

BMCS-Net: A Bi-directional multi-scale cascaded segmentation network based on transformer-guided feature Aggregation for medical images

CiT-Net: Convolutional Neural Networks Hand in Hand with Vision Transformers for Medical Image Segmentation

BSANet: Boundary-aware and Scale-Aggregation Networks for CMR Image Segmentation

Large Window-based Mamba UNet for Medical Image Segmentation: Beyond Convolution and Self-attention

Dual triple attention guided CNN-VMamba for medical image segmentation

CATS v2: Hybrid encoders for robust medical segmentation