Efficient and Effective Training of COVID-19 Classification Networks With Self-Supervised Dual-Track Learning to Rank

Yuexiang Li,Dong Wei,Jiawei Chen,Shilei Cao,Hongyu Zhou,Yanchun Zhu,Jianrong Wu,Lan Lan,Wenbo Sun,Tianyi Qian,Kai Ma,Haibo Xu,Yefeng Zheng
DOI: https://doi.org/10.1109/jbhi.2020.3018181
IF: 7.7
2020-10-01
IEEE Journal of Biomedical and Health Informatics
Abstract:Coronavirus Disease 2019 (COVID-19) has rapidly spread worldwide since first reported. Timely diagnosis of COVID-19 is crucial both for disease control and patient care. Non-contrast thoracic computed tomography (CT) has been identified as an effective tool for the diagnosis, yet the disease outbreak has placed tremendous pressure on radiologists for reading the exams and may potentially lead to fatigue-related mis-diagnosis. Reliable automatic classification algorithms can be really helpful; however, they usually require a considerable number of COVID-19 cases for training, which is difficult to acquire in a timely manner. Meanwhile, how to effectively utilize the existing archive of non-COVID-19 data (the negative samples) in the presence of severe class imbalance is another challenge. In addition, the sudden disease outbreak necessitates fast algorithm development. In this work, we propose a novel approach for effective and efficient training of COVID-19 classification networks using a small number of COVID-19 CT exams and an archive of negative samples. Concretely, a novel self-supervised learning method is proposed to extract features from the COVID-19 and negative samples. Then, two kinds of soft-labels ('difficulty' and 'diversity') are generated for the negative samples by computing the earth mover's distances between the features of the negative and COVID-19 samples, from which data 'values' of the negative samples can be assessed. A pre-set number of negative samples are selected accordingly and fed to the neural network for training. Experimental results show that our approach can achieve superior performance using about half of the negative samples, substantially reducing model training time.
computer science, interdisciplinary applications,mathematical & computational biology,medical informatics, information systems
What problem does this paper attempt to address?