The Devil is in the Statistics: Mitigating and Exploiting Statistics Difference for Generalizable Semi-supervised Medical Image Segmentation

Muyang Qiu,Jian Zhang,Lei Qi,Qian Yu,Yinghuan Shi,Yang Gao
2024-08-01
Abstract:Despite the recent success of domain generalization in medical image segmentation, voxel-wise annotation for all source domains remains a huge burden. Semi-supervised domain generalization has been proposed very recently to combat this challenge by leveraging limited labeled data along with abundant unlabeled data collected from multiple medical institutions, depending on precisely harnessing unlabeled data while improving generalization simultaneously. In this work, we observe that domain shifts between medical institutions cause disparate feature statistics, which significantly deteriorates pseudo-label quality due to an unexpected normalization process. Nevertheless, this phenomenon could be exploited to facilitate unseen domain generalization. Therefore, we propose 1) multiple statistics-individual branches to mitigate the impact of domain shifts for reliable pseudo-labels and 2) one statistics-aggregated branch for domain-invariant feature learning. Furthermore, to simulate unseen domains with statistics difference, we approach this from two aspects, i.e., a perturbation with histogram matching at image level and a random batch normalization selection strategy at feature level, producing diverse statistics to expand the training distribution. Evaluation results on three medical image datasets demonstrate the effectiveness of our method compared with recent SOTA methods. The code is available at <a class="link-external link-https" href="https://github.com/qiumuyang/SIAB" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily addresses the challenges of Semi-supervised Domain Generalization (SSDG) in the field of medical image segmentation and proposes a novel method to enhance the model's generalization ability across different datasets. Specifically, the researchers observed that there are domain shifts in the data from different medical institutions, which lead to differences in feature statistics and subsequently affect the quality of pseudo-labels. To address this issue, the paper proposes two key points: 1. **Statistics-Individual Branches (SIBs)**: To mitigate the impact of differences in feature statistics between different domains on the quality of pseudo-labels, the authors designed multiple independent branches, each targeting a specific medical institution. These branches can independently capture and adapt to the statistical characteristics of their respective domains. This approach can more directly reduce the impact of domain shifts on the prediction process, thereby improving the quality of pseudo-labels. 2. **Statistics-Aggregated Branch (SAB)**: As a complement to SIBs, SAB aims to utilize data from all medical institutions to generate predictions and learn cross-domain invariant features to enhance the overall generalization performance of the model. Additionally, the paper proposes a method to simulate unseen domains by introducing various perturbations at the image and feature levels to expand the training distribution. This allows SAB to maintain consistency under these perturbations, promoting more robust feature learning. In summary, the main goal of the paper is to improve the quality of pseudo-labels and the generalization ability of the model in semi-supervised medical image segmentation tasks through the proposed framework, especially when dealing with data from different medical institutions. Experimental results on three medical image datasets demonstrate that this method achieves significant performance improvements compared to existing techniques.