DMSA-UNet: Dual Multi-Scale Attention makes UNet more strong for medical image segmentation

Xiang Li,Chong Fu,Qun Wang,Wenchao Zhang,Chiu-Wing Sham,Junxin Chen
DOI: https://doi.org/10.1016/j.knosys.2024.112050
IF: 8.139
2024-06-07
Knowledge-Based Systems
Abstract:Convolutional Neural Networks (CNNs), particularly UNet, have become prevalent in medical image segmentation tasks. However, CNNs inherently struggle to capture global dependencies owing to their intrinsic localities. Although Transformers have shown superior performance in modeling global dependencies, they encounter the challenges of high model complexity and dependencies on large-scale pre-trained models. Furthermore, the current attention mechanisms of Transformers only consider single-scale feature interactions, making it difficult to analyze feature correlations at different scales in the same attention layer. In this paper, we propose DMSA-UNet, which strengthens the global analysis capability and maximally preserves the local inductive bias capability while maintaining low model complexity. Specifically, we reformulate vanilla self-attention as efficient Dual Multi-Scale Attention (DMSA) that captures multi-scale-enhanced global information along both spatial and channel dimensions with linear complexity and pixel granularity. We also introduce a context-gated linear unit in DMSA for each feature to obtain adaptive attention based on neighboring contexts. To preserve the convolutional properties, DMSAs are inserted directly between the UNet's convolutional blocks rather than replacing them. Because DMSA has multi-scale adaptive aggregation capability, the deepest convolutional block of UNet is removed to mitigate the noise interference caused by fixed convolutional kernels with large receptive fields. We further leverage efficient convolution to reduce computational redundancy. DMSA-UNet is highly competitive in terms of model complexity, with 33% fewer parameters and 15% fewer FLOPs (at 2242 resolution) than UNet. Extensive experimental results on four different medical datasets demonstrate that DMSA-UNet outperforms other state-of-the-art approaches without any pre-trained models.
computer science, artificial intelligence
What problem does this paper attempt to address?