Abstract:Purpose: In recent years, the use of deep learning for medical image segmentation has become a popular trend, but its development also faces some challenges. Firstly, due to the specialized nature of medical data, precise annotation is time-consuming and labor-intensive. Training neural networks effectively with limited labeled data is a significant challenge in medical image analysis. Secondly, convolutional neural networks commonly used for medical image segmentation research often focus on local features in images. However, the recognition of complex anatomical structures or irregular lesions often requires the assistance of both local and global information, which has led to a bottleneck in its development. Addressing these two issues, in this paper, we propose a novel network architecture. Methods: We integrate a shift window mechanism to learn more comprehensive semantic information and employ a semi-supervised learning strategy by incorporating a flexible amount of unlabeled data. Specifically, a typical U-shaped encoder-decoder structure is applied to obtain rich feature maps. Each encoder is designed as a dual-branch structure, containing Swin modules equipped with windows of different size to capture features of multiple scales. To effectively utilize unlabeled data, a level set function is introduced to establish consistency between the function regression and pixel classification. Results: We conducted experiments on the COVID-19 CT dataset and DRIVE dataset and compared our approach with various semi-supervised and fully supervised learning models. On the COVID-19 CT dataset, we achieved a segmentation accuracy of up to 74.56%. Our segmentation accuracy on the DRIVE dataset was 79.79%. Conclusions: The results demonstrate the outstanding performance of our method on several commonly used evaluation metrics. The high segmentation accuracy of our model demonstrates that utilizing Swin modules with different window sizes can enhance the feature extraction capability of the model, and the level set function can enable semi-supervised models to more effectively utilize unlabeled data. This provides meaningful insights for the application of deep learning in medical image segmentation. Our code will be released once the manuscript is accepted for publication.

Swin MAE: Masked Autoencoders for Small Datasets

SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image Segmentation

Enhancement of Human Face Mask Detection Performance by Using Ensemble Learning Models.

SatSwinMAE: Efficient Autoencoding for Multiscale Time-series Satellite Imagery

Masked autoencoders are effective solution to transformer data-hungry

Medical supervised masked autoencoders: Crafting a better masking strategy and efficient fine-tuning schedule for medical image classification

Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation

MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked Autoencoder

SSTrans-Net: Smart Swin Transformer Network for medical image segmentation

Swin-UMamba†: Adapting Mamba-based Vision Foundation Models for Medical Image Segmentation

Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining

Cervical OCT image classification using contrastive masked autoencoders with Swin Transformer

Self Pre-training with Masked Autoencoders for Medical Image Classification and Segmentation

Self-Supervised Learning with Masked Image Modeling for Teeth Numbering, Detection of Dental Restorations, and Instance Segmentation in Dental Panoramic Radiographs

Advancing Volumetric Medical Image Segmentation via Global-Local Masked Autoencoder

Dual-branch Transformer for semi-supervised medical image segmentation

SwinD-Net: a lightweight segmentation network for laparoscopic liver segmentation

Swin-TransUper: Swin Transformer-based UperNet for medical image segmentation

DENSE SWIN-UNET: DENSE SWIN TRANSFORMERS FOR SEMANTIC SEGMENTATION OF PNEUMOTHORAX IN CT IMAGES

Swin Transformer for Fast MRI