Coarse-to-Fine Learning Framework for Semi-supervised Multimodal MRI Synthesis

Kun Yan,Zhizhe Liu,Shuai Zheng,Zhenyu Guo,Zhenfeng Zhu,Yao Zhao
DOI: https://doi.org/10.1007/978-3-031-02444-3_28
2022-01-01
Abstract:Since multi-modal data contains more information than the single-modal, it is widely leveraged in medical domain. For example, different tissues have distinct visual effects in multimodal MRI, which allows doctors to use multimodal MRI to assist in the diagnosis of a variety of diseases. However, due to scan time, acquisition cost, noise pollution and other reasons in practice, the multimodal MRI of a patient is often not completely acquired. Given a set of available MRI multimodal data, our goal is to develop a semi-supervised (In this paper, semi-supervised learning refers to training using a large amount of unpaired data and a small amount of paired data, unsupervised learning refers to using only unpaired data and supervised learning refers to using only paired data, where paired data refers to MR images of two modalities from the same patient and unpaired data refers to MR images of two modalities from different patients.) deep learning model to synthesize their missing modal data in a coarse-to-fine manner, thereby avoiding the burden of traditional supervised approaches requiring collecting enough paired data for training of synthesis model. Specifically, to unveil the difference and content consistency among modalities of different distributions, large amounts of easily collected unpaired multimodal data were utilized to establish the cross-modal distribution mapping through adversarial learning, thus coarsely generating the reference image for missing MRI modality. Considering the detail differences between different modal images within the same subject, an enhancement network, trained with only a very small amount of available paired multimodal data, is used to further refine the generated reference image. Extensive experiments on BRATS2015 [13] dataset demonstrate that the proposed model outperforms both the unsupervised and the supervised methods quantitatively and qualitatively, and the latter requires a large amount of complete multimodal training data.
What problem does this paper attempt to address?