EM-Net: Efficient Channel and Frequency Learning with Mamba for 3D Medical Image Segmentation

Ao Chang,Jiajun Zeng,Ruobing Huang,Dong Ni
2024-09-26
Abstract:Convolutional neural networks have primarily led 3D medical image segmentation but may be limited by small receptive fields. Transformer models excel in capturing global relationships through self-attention but are challenged by high computational costs at high resolutions. Recently, Mamba, a state space model, has emerged as an effective approach for sequential modeling. Inspired by its success, we introduce a novel Mamba-based 3D medical image segmentation model called EM-Net. It not only efficiently captures attentive interaction between regions by integrating and selecting channels, but also effectively utilizes frequency domain to harmonize the learning of features across varying scales, while accelerating training speed. Comprehensive experiments on two challenging multi-organ datasets with other state-of-the-art (SOTA) algorithms show that our method exhibits better segmentation accuracy while requiring nearly half the parameter size of SOTA models and 2x faster training speed.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address several key issues in 3D medical image segmentation: 1. **Limited Receptive Field**: Traditional Convolutional Neural Networks (CNNs) perform well in 3D medical image segmentation but are limited by a small receptive field, making it difficult to capture global information. 2. **High Computational Cost**: Transformer models can capture global relationships through self-attention mechanisms, but they have high computational costs on high-resolution images, leading to inefficiency. 3. **Spatial Relationship Modeling**: Sequence models struggle to effectively model spatial relationships when processing 3D images, often relying on strong positional encoding, which is limited in complex spatial dependencies. 4. **Memory Consumption**: Medical image segmentation tasks often require substantial memory support, especially when hardware resources are limited. To address these issues, the authors propose a novel 3D medical image segmentation model based on Mamba—EM-Net. The main contributions of EM-Net include: 1. **Channel Squeeze and Excitation Mamba (CSRM) Module**: Effectively captures relevant patterns in target regions through channel selection and adaptive calibration. 2. **Efficient Frequency Domain Learning (EFL) Layer**: Utilizes Fast Fourier Transform (FFT) to achieve learnable frequency weighting, balancing the learning of features at different scales. 3. **Mamba-Enhanced Decoder**: Further improves segmentation performance while reducing memory consumption. Experimental results show that EM-Net performs excellently on two challenging multi-organ datasets, achieving higher segmentation accuracy with only half the parameters of existing state-of-the-art models and doubling the training speed.