Simultaneous Imputation and Prediction with High-dimensional Data (SIP-HD): A Deep Learning Model for Disease Diagnosis

Zhenzhen Jia,Jianqiang Hu,Kejia Hu,Qingchen Wang,Ning Zhang
DOI: https://doi.org/10.2139/ssrn.3985872
2021-01-01
SSRN Electronic Journal
Abstract:Accurate diagnosis directly impacts service quality and resource allocation in healthcare operations. Advanced medical tests can improve doctors’ diagnostic accuracy but are invasive, costly, and sometimes infeasible to conduct in certain geographic locations and health conditions. We find that machine learning models can better utilize advanced medical test results for accurate disease prediction compared with doctors. To achieve decent diagnostic performance when advanced medical test results are missing, our research proposes a deep learning diagnostic model, SIP-HD, that simultaneously performs imputation and prediction with high-dimensional data. Our model performs better in accuracy than traditional two-step machine learning models that first impute missing data using mean or K-nearest neighbors (KNN) imputation and then apply machine learning models such as Logistic regression (LR) and light gradient boosting machine (LGB). Moreover, our model performs better than doctors’ preliminary diagnoses that utilize limited advanced medical test results. Compared to the usage of advanced medical tests, our model is a low-cost and non-invasive alternative for delivering a high-quality diagnosis for patients, especially for those living in rural areas. Moreover, our model is valuable to assist doctors’ diagnostic practices for patients via telehealth.
What problem does this paper attempt to address?