Expectation Maximization Pseudo Labels

Moucheng Xu,Yukun Zhou,Chen Jin,Marius de Groot,Daniel C. Alexander,Neil P. Oxtoby,Yipeng Hu,Joseph Jacob
2024-01-26
Abstract:In this paper, we study pseudo-labelling. Pseudo-labelling employs raw inferences on unlabelled data as pseudo-labels for self-training. We elucidate the empirical successes of pseudo-labelling by establishing a link between this technique and the Expectation Maximisation algorithm. Through this, we realise that the original pseudo-labelling serves as an empirical estimation of its more comprehensive underlying formulation. Following this insight, we present a full generalisation of pseudo-labels under Bayes' theorem, termed Bayesian Pseudo Labels. Subsequently, we introduce a variational approach to generate these Bayesian Pseudo Labels, involving the learning of a threshold to automatically select high-quality pseudo labels. In the remainder of the paper, we showcase the applications of pseudo-labelling and its generalised form, Bayesian Pseudo-Labelling, in the semi-supervised segmentation of medical images. Specifically, we focus on: 1) 3D binary segmentation of lung vessels from CT volumes; 2) 2D multi-class segmentation of brain tumours from MRI volumes; 3) 3D binary segmentation of whole brain tumours from MRI volumes; and 4) 3D binary segmentation of prostate from MRI volumes. We further demonstrate that pseudo-labels can enhance the robustness of the learned representations. The code is released in the following GitHub repository: <a class="link-external link-https" href="https://github.com/moucheng2017/EMSSL" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the issue of utilizing pseudo-labelling in medical image segmentation tasks to alleviate the problem of scarce annotated data. Specifically: 1. **Scarcity of Annotated Data**: Training deep learning models typically requires a large amount of annotated data, and obtaining these annotations often consumes a significant amount of time and money. Especially in the medical field, high-quality annotations require the involvement of professional doctors, further increasing the cost. 2. **Application of Semi-Supervised Learning Methods**: To tackle the problem of insufficient annotated data, the paper proposes a semi-supervised learning method based on pseudo-labelling. By combining a small amount of annotated data with a large amount of unannotated data, the model's performance can be improved. 3. **Theoretical Foundation and Improvements**: The paper establishes a connection between the pseudo-labelling technique and the classical Expectation-Maximization (EM) algorithm, and proposes Bayesian Pseudo Labels to generate high-quality pseudo labels, thereby reducing the noise introduced by incorrect pseudo labels. 4. **Application Validation**: The paper validates the approach on various medical image segmentation tasks, including: - 3D binary segmentation of lung vessels (CT images) - 2D multi-class segmentation of brain tumors (MRI images) - 3D binary segmentation of whole brain tumors (MRI images) - 3D binary segmentation of the prostate (MRI images) Through these applications, the paper demonstrates the potential of pseudo-labelling techniques in improving the robustness and accuracy of models.