Multi-Level Global Context Cross Consistency Model for Semi-Supervised Ultrasound Image Segmentation with Diffusion Model

Fenghe Tang,Jianrui Ding,Lingtao Wang,Min Xian,Chunping Ning
DOI: https://doi.org/10.48550/arXiv.2305.09447
2023-05-17
Abstract:Medical image segmentation is a critical step in computer-aided diagnosis, and convolutional neural networks are popular segmentation networks nowadays. However, the inherent local operation characteristics make it difficult to focus on the global contextual information of lesions with different positions, shapes, and sizes. Semi-supervised learning can be used to learn from both labeled and unlabeled samples, alleviating the burden of manual labeling. However, obtaining a large number of unlabeled images in medical scenarios remains challenging. To address these issues, we propose a Multi-level Global Context Cross-consistency (MGCC) framework that uses images generated by a Latent Diffusion Model (LDM) as unlabeled images for semi-supervised learning. The framework involves of two stages. In the first stage, a LDM is used to generate synthetic medical images, which reduces the workload of data annotation and addresses privacy concerns associated with collecting medical data. In the second stage, varying levels of global context noise perturbation are added to the input of the auxiliary decoder, and output consistency is maintained between decoders to improve the representation ability. Experiments conducted on open-source breast ultrasound and private thyroid ultrasound datasets demonstrate the effectiveness of our framework in bridging the probability distribution and the semantic representation of the medical image. Our approach enables the effective transfer of probability distribution knowledge to the segmentation network, resulting in improved segmentation accuracy. The code is available at <a class="link-external link-https" href="https://github.com/FengheTan9/Multi-Level-Global-Context-Cross-Consistency" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on the field of medical image segmentation, especially the challenges faced by ultrasound image segmentation. Specifically, the paper aims to solve the following key issues: 1. **Acquisition of global context information**: - Convolutional neural networks (CNNs) have difficulty in paying attention to global context information when dealing with lesions at different positions, shapes, and sizes. This is because the convolution operation is essentially local and it is difficult to capture long - distance dependencies in the image. 2. **Lack of labeled data**: - Medical image segmentation requires a large amount of labeled data to train the model, but it is very difficult to obtain a large amount of high - quality labeled data. This is not only because the labeling process is time - consuming and labor - intensive, but also involves issues such as patient privacy protection. 3. **Difficulty in obtaining unlabeled data**: - In the medical scenario, obtaining a large number of unlabeled images is also challenging, especially under the premise of protecting patient privacy. 4. **Effectiveness of semi - supervised learning**: - Semi - supervised learning can use a small amount of labeled data and a large amount of unlabeled data to train the model, thereby reducing the burden of manual labeling. However, in some cases, differences in style and content of ultrasound images from different sources may affect the effectiveness of semi - supervised learning. To solve the above problems, the paper proposes a multi - level global context cross - consistency framework (Multi - level Global Context Cross - consistency, MGCC). This framework uses the latent diffusion model (Latent Diffusion Model, LDM) to generate synthetic ultrasound images as unlabeled data for semi - supervised learning. Through this method, not only can the workload of data labeling be significantly reduced, but also the obstacle of collecting a large amount of unlabeled private data can be overcome. In addition, the paper also introduces different levels of global context noise perturbations and maintains the consistency between decoder outputs to improve the model's ability to represent target objects at different morphologies and positions. In summary, the main goal of this paper is to improve the accuracy and robustness of medical ultrasound image segmentation by combining generative models and semi - supervised learning, especially in the case of limited labeled data.