Knowledge distillation for semi-supervised domain adaptation

Mauricio Orbes-Arteaga,Jorge Cardoso,Lauge Sørensen,Christian Igel,Sebastien Ourselin,Marc Modat,Mads Nielsen,Akshay Pai
DOI: https://doi.org/10.48550/arXiv.1908.07355
2019-08-16
Abstract:In the absence of sufficient data variation (e.g., scanner and protocol variability) in annotated data, deep neural networks (DNNs) tend to overfit during training. As a result, their performance is significantly lower on data from unseen sources compared to the performance on data from the same source as the training data. Semi-supervised domain adaptation methods can alleviate this problem by tuning networks to new target domains without the need for annotated data from these domains. Adversarial domain adaptation (ADA) methods are a popular choice that aim to train networks in such a way that the features generated are domain agnostic. However, these methods require careful dataset-specific selection of hyperparameters such as the complexity of the discriminator in order to achieve a reasonable performance. We propose to use knowledge distillation (KD) -- an efficient way of transferring knowledge between different DNNs -- for semi-supervised domain adaption of DNNs. It does not require dataset-specific hyperparameter tuning, making it generally applicable. The proposed method is compared to ADA for segmentation of white matter hyperintensities (WMH) in magnetic resonance imaging (MRI) scans generated by scanners that are not a part of the training set. Compared with both the baseline DNN (trained on source domain only and without any adaption to target domain) and with using ADA for semi-supervised domain adaptation, the proposed method achieves significantly higher WMH dice scores.
Machine Learning,Image and Video Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the performance degradation of deep neural networks (DNNs) in medical image segmentation tasks when the training data and the test data come from different distributions. Specifically, due to the high cost of labeling medical imaging data and the difficulty in obtaining a large amount of labeled data, the training data sets often cannot fully cover all possible data variations (such as differences in scanners and protocols). This causes DNNs to perform poorly when dealing with unseen data sources. To solve this problem, the paper proposes a method based on knowledge distillation (KD) for semi - supervised domain adaptation to reduce the dependence on labeled data in the target domain and improve the generalization ability of the model on new data sources. By transferring knowledge from a teacher model trained on the source domain to a student model, this method aims to enable the student model to better adapt to the data characteristics of the target domain, thereby improving the segmentation performance in the target domain. The paper evaluates the performance of baseline models (trained only on the source domain), adversarial domain adaptation (ADA) methods, and the proposed KD method in white matter hyperintensities (WMH) segmentation tasks in multiple different clinical scenarios. The experimental results show that, except for the domain adaptation scenario from the Utrecht clinic to the Singapore clinic, the proposed KD method outperforms the ADA method in most cases, especially when dealing with small lesions. In addition, a significant advantage of the KD method over the ADA method is that its design is relatively simple, and it does not require extensive adjustments to the network architecture. Good performance can be achieved simply by selecting an appropriate temperature parameter.