A 3D Medical Image Segmentation Framework Fusing Convolution and Transformer Features

Fazhan Zhu,Jiaxing Lv,Kun Lu,Wenyan Wang,Hongshou Cong,Jun Zhang,Peng Chen,Yuan Zhao,Ziheng Wu
DOI: https://doi.org/10.1007/978-3-031-13870-6_63
2022-01-01
Abstract:Medical images can be accurately segmented to provide reliable basis for clinical diagnosis and pathology research, and assist doctors to make more accurate diagnosis, as well as deep learning technology can accelerate this process. Convolutional Neural Networks (CNNs) and Transformer have become two mainstream architectures of deep learning in medical image segmentation. However, the Transformer architecture has limited ability to obtain local inductive bias, and the Transformer architecture is at a disadvantage in a small sample data set. Many theories and experiments show that the above problems can be effectively solved by fusing Convolution and Transformer features. In this manuscript, a new U-shaped segmentation model based on Convolution and swin-transformer framework is proposed, which is called CST-UNET. In the encoder part, it combines the advantages of both dilated convolution and Transformer, which can make the model fully obtain semantic inductive bias information and long-term information. At the same time, it has the advantages of fewer parameters and lower Flops. Even if it is trained on a small sample data set, the framework still has strong generalization ability. In addition, on BraTS2021 dataset, the Dice coefficients of ET, TC and WTare 85.46%, 89.38%, 92.35% respectively, and the result of HD95 are 7.95, 5.06 and 4.07 respectively.
What problem does this paper attempt to address?