Abstract:In the field of biomedical image analysis, the quest for architectures capable of effectively capturing long-range dependencies is paramount, especially when dealing with 3D image segmentation, classification, and landmark detection. Traditional Convolutional Neural Networks (CNNs) struggle with locality respective field, and Transformers have a heavy computational load when applied to high-dimensional medical
What problem does this paper attempt to address?
This paper attempts to address the challenge of effectively capturing long - range dependencies in 3D biomedical image analysis, especially when dealing with 3D image segmentation, classification, and keypoint detection tasks. Traditional Convolutional Neural Networks (CNNs) perform poorly in such tasks due to the limitation of local receptive fields, while Transformers, although good at modeling global information, have an excessive computational burden on high - dimensional medical images. Therefore, this paper proposes a new architecture, nnMamba, which combines the local feature extraction ability of CNNs and the efficient long - range dependency modeling ability of State - Space Models (SSMs).
### Main problems
1. **Long - range dependency modeling**: Traditional CNNs have difficulty effectively capturing long - range dependencies in 3D biomedical images, especially in dense prediction tasks (such as segmentation and keypoint detection) and classification tasks.
2. **Computational efficiency**: Transformers have a high computational complexity when processing high - dimensional medical images, leading to limited applications.
### Solutions
To solve the above problems, the author proposes the following innovations:
1. **Introducing the Mamba - In - Convolution with Channel - Spatial Siamese learning (MICCSS) module**:
- By fusing the advantages of CNNs and SSMs, the MICCSS module is designed to model the long - range relationships between voxels.
- The MICCSS module can enhance feature interaction in the channel and spatial dimensions, thereby improving the model's ability to capture long - range dependencies.
2. **Optimized design for different tasks**:
- **Segmentation and keypoint detection**: Adopt the UNet architecture, combine the residual encoder and the convolutional decoder, and stabilize the training process through the learning - based scaling method.
- **Classification tasks**: Introduce the Mamba layer to give features global context early, reduce the need for subsequent complex operations, and process multi - scale features through hierarchical sequences.
3. **Experimental verification**:
- Extensive experiments have been carried out on multiple public datasets, including BraTS 2023, AMOS2022, etc., to verify the superior performance of nnMamba in segmentation, classification, and keypoint detection tasks.
- The experimental results show that nnMamba is not only superior to existing methods in accuracy, but also shows higher efficiency in the number of parameters and computational complexity.
### Formula representation
- The basic equation of the State - Space Model (SSM) is:
\[
x'(t) = A x(t) + B u(t); \quad y(t) = C x(t)
\]
where \( x(t) \in \mathbb{R}^N \), \( A \in \mathbb{R}^{N \times N} \), \( B, C \in \mathbb{R}^N \) are system parameters.
- The formula of the MICCSS module is:
\[
F_{\text{out}} = \text{Convs.O} \left( \text{SSM}(\text{Convs.I}(F_{\text{in}})) + \text{Convs.I}(F_{\text{in}}) \right)
\]
Through these innovations, nnMamba provides an effective solution, which not only maintains the local representation ability of CNNs, but also has the efficient global context processing ability of SSMs, setting a new standard for 3D biomedical image analysis.