Dual-branch Transformer for semi-supervised medical image segmentation

Xiaojie Huang,Yating Zhu,Minghan Shao,Ming Xia,Xiaoting Shen,Pingli Wang,Xiaoyan Wang
DOI: https://doi.org/10.1002/acm2.14483
Abstract:Purpose: In recent years, the use of deep learning for medical image segmentation has become a popular trend, but its development also faces some challenges. Firstly, due to the specialized nature of medical data, precise annotation is time-consuming and labor-intensive. Training neural networks effectively with limited labeled data is a significant challenge in medical image analysis. Secondly, convolutional neural networks commonly used for medical image segmentation research often focus on local features in images. However, the recognition of complex anatomical structures or irregular lesions often requires the assistance of both local and global information, which has led to a bottleneck in its development. Addressing these two issues, in this paper, we propose a novel network architecture. Methods: We integrate a shift window mechanism to learn more comprehensive semantic information and employ a semi-supervised learning strategy by incorporating a flexible amount of unlabeled data. Specifically, a typical U-shaped encoder-decoder structure is applied to obtain rich feature maps. Each encoder is designed as a dual-branch structure, containing Swin modules equipped with windows of different size to capture features of multiple scales. To effectively utilize unlabeled data, a level set function is introduced to establish consistency between the function regression and pixel classification. Results: We conducted experiments on the COVID-19 CT dataset and DRIVE dataset and compared our approach with various semi-supervised and fully supervised learning models. On the COVID-19 CT dataset, we achieved a segmentation accuracy of up to 74.56%. Our segmentation accuracy on the DRIVE dataset was 79.79%. Conclusions: The results demonstrate the outstanding performance of our method on several commonly used evaluation metrics. The high segmentation accuracy of our model demonstrates that utilizing Swin modules with different window sizes can enhance the feature extraction capability of the model, and the level set function can enable semi-supervised models to more effectively utilize unlabeled data. This provides meaningful insights for the application of deep learning in medical image segmentation. Our code will be released once the manuscript is accepted for publication.
What problem does this paper attempt to address?