MS-Twins: Multi-Scale Deep Self-Attention Networks for Medical Image Segmentation

Jing Xu
2024-09-17
Abstract:Although transformer is preferred in natural language processing, some studies has only been applied to the field of medical imaging in recent years. For its long-term dependency, the transformer is expected to contribute to unconventional convolution neural net conquer their inherent spatial induction bias. The lately suggested transformer-based segmentation method only uses the transformer as an auxiliary module to help encode the global context into a convolutional representation. How to optimally integrate self-attention with convolution has not been investigated in depth. To solve the problem, this paper proposes MS-Twins (Multi-Scale Twins), which is a powerful segmentation model on account of the bond of self-attention and convolution. MS-Twins can better capture semantic and fine-grained information by combining different scales and cascading features. Compared with the existing network structure, MS-Twins has made progress on the previous method based on the transformer of two in common use data sets, Synapse and ACDC. In particular, the performance of MS-Twins on Synapse is 8% higher than SwinUNet. Even compared with nnUNet, the best entirely convoluted medical image segmentation network, the performance of MS-Twins on Synapse and ACDC still has a bit advantage.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?