Swin-TransUper: Swin Transformer-based UperNet for medical image segmentation

Li, Chengyu,Zheng, Zhichao
DOI: https://doi.org/10.1007/s11042-024-19009-x
IF: 2.577
2024-04-03
Multimedia Tools and Applications
Abstract:Convolutional Neural Network-based UNet and its variants have shown remarkable performance in medical image segmentation. However, these methods can only capture local features without spatial correlations and are incapable of global modeling. Previous studies prove that local and global features are critical in computer vision. Therefore, based on the abovementioned considerations, this paper proposes a pure Transformer model named Swin-TransUper. Firstly, we explore extending UperNet by incorporating the hierarchical Swin Transformer with shifted windows, thereby enhancing the global modeling capability of the model. Secondly, we introduce an SPPM (Swin Pyramid Pooling Module) to conduct multi-scale feature mining on the deepest features generated by the encoder, fully considering the semantic information of the deepest features. Finally, the multi-scale attention module aggregates the multi-scale feature information to obtain a more refined feature map. Our method achieves the state-of-the-art performance of 80.08%, 90.25%, and 90.62% on the Synapse multi-organ segmentation, ISIC2017, and ACDC datasets based on the DSC (Dice Similarity Coefficient) metric. At the same time, experimental results on the ISIC2017 dataset show that Swin-TransUper achieves the best performance on Sensitivity and Accuracy metrics of 91.20% and 96.44%, respectively. Our code is available at https://github.com/JianJianYin/Swin-TransUper.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?