Prediction of Disease Progression of COVID-19 Based on Machine Learning: A Retrospective Multicentre Cohort Study in Wuhan, China
Fumin Xu,Yongjian Nian,Xiao Chen,Xinru Yin,Qiu,Jingjing Xiao,Liang Qiao,Mi He,Liang Tang,Qi Li,Hu Tan,Li,Guoqiang Cao,Xiawei Li,Qiao Zhang,Yanlin Lv,Shili Xiao,Rong Zhao,Yan Guo,Mingsheng Chen,Dongfeng Chen,Liangzhi Wen,Bin Wang,Kaijun Liu
DOI: https://doi.org/10.2139/ssrn.3578772
2020-01-01
Abstract:Background: Since December, 2019, the outbreak of COVID-19 caused by a novel betacoronavirus is still accelerating throughout the world. Majority of infected individuals suffered from mild pneumonia, while a proportion of patients would progress to severe pneumonia. Therefore, it is vital to identify the patients at high risk of disease progression. Methods: In this retrospective, multicentre cohort study, laboratory confirmed COVID-19 patients from Huoshenshan hospital and Tongji Taikang hospital (Wuhan, China) were included. Clinical features with significant difference between severe and nonsevere group were screened out by univariate analysis. Then, these features were used to generate predictive models by using machine learning. Two test sets from two hospitals were established to evaluate the predictive performance of the trained models, respectively. Moreover, a software was developed for prediction in clinical practice. Findings: A total of 455 patients were included in this study. Twenty-one features with significant difference between severe and nonsevere group were selected in training and validation set for modeling. The optimal subset with 11 features in KNN model obtained the highest area under curve (AUC) value (0.9484, 95%CI: 0.924-0.973) among the four models in the validation set. D-dimer, CRP, and age showed the top three important features in the optimal feature subsets selected by K-fold cross validation. The highest AUC value (0.9594, 95%CI: 0.920-0.999) was obtained by support vector machine (SVM) model in test set from Huoshenshan hospital. A software for predicting disease progression based on machine learning was developed for clinical practice.Interpretations: The predictive models were successfully established based on machine learning, and achieved satisfied predictive performance of disease progression with optimal feature subsets. The predictive models can be conveniently used in clinical practice.Funding Statement: This work was supported by the National Natural Science Foundation of China (81700483), Chongqing Research Program of Basic Research and frontier technology (cstc2017jcyjAX0302), and Army Medical University frontier technology Research Program (2019XLC3051). Declaration of Interests: The authors declare that there is no conflict of interest.Ethics Approval Statement: This study was approved by the ethics committee of Wuhan Huoshenshan hospital (epicenter Wuhan, China). As all subjects were anonymized in this retrospective study, the written informed consent was waived due to urgent need.