Abstract:Background During a pandemic, it is important for clinicians to stratify patients and decide who receives limited medical resources. Machine learning models have been proposed to accurately predict COVID-19 disease severity. Previous studies have typically tested only one machine learning algorithm and limited performance evaluation to area under the curve analysis. To obtain the best results possible, it may be important to test different machine learning algorithms to find the best prediction model. Objective In this study, we aimed to use automated machine learning (autoML) to train various machine learning algorithms. We selected the model that best predicted patients’ chances of surviving a SARS-CoV-2 infection. In addition, we identified which variables (ie, vital signs, biomarkers, comorbidities, etc) were the most influential in generating an accurate model. Methods Data were retrospectively collected from all patients who tested positive for COVID-19 at our institution between March 1 and July 3, 2020. We collected 48 variables from each patient within 36 hours before or after the index time (ie, real-time polymerase chain reaction positivity). Patients were followed for 30 days or until death. Patients’ data were used to build 20 machine learning models with various algorithms via autoML. The performance of machine learning models was measured by analyzing the area under the precision-recall curve (AUPCR). Subsequently, we established model interpretability via Shapley additive explanation and partial dependence plots to identify and rank variables that drove model predictions. Afterward, we conducted dimensionality reduction to extract the 10 most influential variables. AutoML models were retrained by only using these 10 variables, and the output models were evaluated against the model that used 48 variables. Results Data from 4313 patients were used to develop the models. The best model that was generated by using autoML and 48 variables was the stacked ensemble model (AUPRC=0.807). The two best independent models were the gradient boost machine and extreme gradient boost models, which had an AUPRC of 0.803 and 0.793, respectively. The deep learning model (AUPRC=0.73) was substantially inferior to the other models. The 10 most influential variables for generating high-performing models were systolic and diastolic blood pressure, age, pulse oximetry level, blood urea nitrogen level, lactate dehydrogenase level, D-dimer level, troponin level, respiratory rate, and Charlson comorbidity score. After the autoML models were retrained with these 10 variables, the stacked ensemble model still had the best performance (AUPRC=0.791). Conclusions We used autoML to develop high-performing models that predicted the survival of patients with COVID-19. In addition, we identified important variables that correlated with mortality. This is proof of concept that autoML is an efficient, effective, and informative method for generating machine learning–based clinical decision support tools.

Artificial Intelligence (AI) Based Prediction of Mortality, for COVID-19 Patients

Using Machine Learning to Predict Mortality for COVID-19 Patients on Day 0 in the ICU

Comparing machine learning algorithms for predicting COVID-19 mortality

Using machine learning in prediction of ICU admission, mortality, and length of stay in the early stage of admission of COVID-19 patients

Predictive Value of Machine Learning Models in Mortality of Coronavirus Disease 2019 (COVID-19) Pneumonia

Machine Learning Based Clinical Decision Support System for Early COVID-19 Mortality Prediction

Sexual selection in a monomorphic lek-breeding bird: correlates of male mating success in the great snipe Gallinago media

Prediction of COVID-19 Hospitalization and Mortality Using Artificial Intelligence

Artificial intelligence for predicting mortality in hospitalized COVID-19 patients

Machine learning algorithms for predicting determinants of COVID-19 mortality in South Africa

An Early Warning Tool for Predicting Mortality Risk of COVID-19 Patients Using Machine Learning

Predicting Intensive Care Unit Length of Stay and Mortality Using Patient Vital Signs: Machine Learning Model Development and Validation

Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study

Prediction of ICU Patients’ Deterioration Using Machine Learning Techniques

Artificial Intelligence-Based Models for Prediction of Mortality in ICU Patients: A Scoping Review

Machine learning prediction for mortality of patients diagnosed with COVID-19: a nationwide Korean cohort study

Development and validation of a machine learning-based prediction model for near-term in-hospital mortality among patients with COVID-19

Early Prediction of Mortality Risk among Patients with Severe COVID-19, Using Machine Learning

Prediction of COVID-19 Patient using Supervised Machine Learning Algorithm

Deep Learning Model Utilization for Mortality Prediction in Mechanically Ventilated ICU Patients

A machine learning-based prediction of hospital mortality in mechanically ventilated ICU patients