Swin Deformable Attention Hybrid U-Net for Medical Image Segmentation

Lichao Wang,Jiahao Huang,Xiaodan Xing,Guang Yang
2023-09-27
Abstract:Medical image segmentation is a crucial task in the field of medical image analysis. Harmonizing the convolution and multi-head self-attention mechanism is a recent research focus in this field, with various combination methods proposed. However, the lack of interpretability of these hybrid models remains a common pitfall, limiting their practical application in clinical scenarios. To address this issue, we propose to incorporate the Shifted Window (Swin) Deformable Attention into a hybrid architecture to improve segmentation performance while ensuring explainability. Our proposed Swin Deformable Attention Hybrid UNet (SDAH-UNet) demonstrates state-of-the-art performance on both anatomical and lesion segmentation tasks. Moreover, we provide a direct and visual explanation of the model focalization and how the model forms it, enabling clinicians to better understand and trust the decision of the model. Our approach could be a promising solution to the challenge of developing accurate and interpretable medical image segmentation models.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address two key issues in the task of medical image segmentation: 1. **Improving Segmentation Performance**: By combining convolutional mechanisms with multi-head self-attention (MSA) mechanisms, the accuracy of medical image segmentation is enhanced. Although existing hybrid models have shown improvements in segmentation performance, their interpretability is poor, limiting their practical application in clinical scenarios. 2. **Enhancing Model Interpretability**: A new hybrid architecture is proposed—a hybrid UNet (SDAH-UNet) that includes a shifted window (Swin) deformable attention module (SDMSA), to improve segmentation performance while ensuring the interpretability of the model's decision-making process. This allows clinicians to better understand and trust the model's predictions. By introducing the SDMSA module, the paper addresses the issue of redundant attention in traditional attention mechanisms due to the lack of deformable capabilities. It also captures detailed texture features through parallel convolutional branches, achieving more precise target focusing, reducing computational redundancy, and enhancing overall performance. Experimental results show that SDAH-UNet performs excellently on multiple medical image datasets, including cardiac anatomical structure segmentation (ACDC) and brain tumor segmentation (BraTS2020).