Stitching Transformer and Convolution in U-Net for Medical Image Segmentation

Junhao Pan,Zexuan Ji
DOI: https://doi.org/10.1109/cac59555.2023.10451981
2023-01-01
Abstract:Medical image segmentation is a crucial aspect of medical image analysis. While U-Net is one of the most widely used models for medical image segmentation, its intrinsic locality of convolution often fails to learn global context and long-range spatial relations. Recent research has attempted to introduce Transformer into U-Net to address this issue. However, the Transformer requires large-scale pre-training and does not work well on small datasets. Additionally, medical images tend to have low contrast and fuzzy boundaries, which are sensitive to both global and local information. To tackle these problems, we propose a novel network by stitching transformer and convolution in U-Net, which follows the multiscale symmetry structure of U-Net to make full use of the respective advantages. Specifically, we propose a CTC block by adding a self-attention layer between two serial convolution layers. The first convolution layer integrates position information, the self-attention layer extracts long-distance dependencies, and the second convolution layer explores inductive bias and aggregates global information. Moreover, we construct a cross-attention module in the skip connection procedure to further strengthen the recovery of feature information in the decoder. We evaluate the proposed model on two public datasets, and the experimental results demonstrate that our method outperforms state-of-the-art methods. The code for our model will be available on Github.
What problem does this paper attempt to address?