Abstract:Background: Despite careful patient selection and improved supportive care over recent years, chemotherapy for acute myeloid leukemia (AML) is still associated with a significant risk for treatment-related mortality (5-20%), resulting from an interplay of multiple patient and disease-related factors. Tools to better identify candidates suitable for intensive chemotherapy are much needed. In this analysis, using a large administrative database, we evaluate the potential of machine learning (ML) algorithms trained using factors available at the time of admission for AML therapy to predict death during the hospitalization. Methods: We utilized the State Inpatient Database (SID) for years 2008-2014, which holds one of the largest collections of inpatient discharges incorporating all payers from community hospitals in the United States and is part of Healthcare Cost and Utilization Project (HCUP). Data from following states was obtained for analysis: Arizona, Florida, New York, Maryland, Washington, and New Jersey. Our cohort included adult (age >17) patients diagnosed with AML (ICD-9 codes 205.XX, 206.XX, and 207.XX) and receipt of any type of chemotherapy on same admission confirmed by ICD-9 diagnosis code (V581, V5811, and V5812) or ICD-9 procedure code of 9925. The primary objective was to predict inpatient mortality in AML patients undergoing chemotherapy using covariates that were present prior to chemotherapy initiation. Features included age, race, emergency room use, year of admission, number of days from admission to chemotherapy, comorbid conditions present at the time of admission, and procedures performed on or before the day of administration of chemotherapy. The main cohort was split into training (80%) and test (20%) sets. We compared several supervised machine learning classification algorithms including logistic regression (LR), decision trees (DT) and random forests (RF). Algorithms were trained using 5-fold cross validation with hyperparameters selected via grid search to prevent overfitting. Model performance on the test set was accessed using area under the receiver operating characteristic curve (AUC, ROC). True positive rate (TPR), true negative rate (TNR) and positive predictive value (PPV) were assessed at multiple thresholds. SAS 9.4 and Python libraries were utilized for all analysis. Results: A total of 29613 subjects with AML were included in final analysis each associated with 4177 features after including indicators to capture missing categorical variables. Median age was 58.9(18-101) years. 13689 (53.7%) were males and 20203 (69%) were Caucasian. Each subject underwent some form of chemotherapy. Mean time from admission to starting chemotherapy was 3 days (95%CI, 2.9-3.1). Among all subjects, 2682 (9.1%) died during the hospitalization following chemotherapy administration. Figure 1 shows the ROC curve comparing all algorithms. Both LR and RF achieved an AUC score of 0.78 while DT achieved 0.70 AUC. In comparison, a baseline LR model with age as the sole predictor yielded 0.62 AUC. Table 1 provides TPR, TNR and PPV for each algorithm at varying decision boundary thresholds. Discussion: The strength of this machine learning approach is the applicability of using readily-accessible personalized variables to predict inpatient mortality of any patient on track for chemotherapy to treat AML, without incorporating performance status or any laboratory information. Using a threshold of 0.7, our trained RF model achieves TNR of 99.2%, TPR of 8.6 % and PPV of 57.3%. If this threshold is used to select patients suitable for chemotherapy, 51 out of 587 total deaths that occurred in our test set of 5923 could have avoided treatment related mortality while 38 would not have received chemotherapy as they will be falsely flagged. Our study supports the use of supervised machine learning algorithms on large administrative databases to create healthcare solutions. One limitation is that this dataset is not able to differentiate discharges following induction therapy in newly diagnosed patients from those following consolidation. Next steps would be to validate on larger cohorts with more detailed therapy information. Ultimately, estimating inpatient mortality at the time of hospitalization may prove useful in helping clinicians identify high-risk patients for whom alternative treatment options would have better outcomes than chemotherapy. Disclosures No relevant conflicts of interest to declare.

Machine learning as a tool to identify inpatients who are not at risk of adverse drug events in a large dataset of a tertiary care hospital in the USA

Predicting adverse drug event using machine learning based on electronic health records: a systematic review and meta-analysis

Predicting Adverse Drug Events in Chinese Pediatric Inpatients With the Associated Risk Factors: A Machine Learning Study

A Machine-Learning Algorithm to Optimise Automated Adverse Drug Reaction Detection from Clinical Coding

Using Machine Learning and Electronic Health Record (EHR) Data for the Early Prediction of Alzheimer's Disease and Related Dementias.

Developing machine learning models for prediction of mortality in the medical intensive care unit

Using machine learning methods to predict all-cause somatic hospitalizations in adults: A systematic review

Comparison of machine-learning and logistic regression models for prediction of 30-day unplanned readmission in electronic health records: A development and validation study

Explainable Machine Learning Approach to Prediction of Prolonged Intesive Care Unit Stay in Adult Spinal Deformity Patients: Machine Learning Outperforms Logistic Regression

Machine learning for adverse event prediction in outpatient parenteral antimicrobial therapy: a scoping review

In-hospital mortality, readmission, and prolonged length of stay risk prediction leveraging historical electronic patient records

Supervised Machine Learning Algorithms Using Patient Related Factors to Predict in-Hospital Mortality Following Acute Myeloid Leukemia Therapy

Machine Learning Approach to Predict In‐Hospital Mortality in Patients Admitted for Peripheral Artery Disease in the United States

Using machine learning to predict acute myocardial infarction and ischemic heart disease in primary care cardiovascular patients

Development and Validation of an Electronic Health Record–Based Machine Learning Model to Estimate Delirium Risk in Newly Hospitalized Patients Without Known Cognitive Impairment

Application of machine learning in predicting hospital readmissions: a scoping review of the literature

Predictive analytics for early detection of hospital-acquired complications: An artificial intelligence approach

Machine learning-based risk prediction model for medication administration errors in neonatal intensive care units: A prospective direct observational study

Machine learning models for diabetes management in acute care using electronic medical records: A systematic review

Machine learning in diagnostic support in medical emergency departments

Development, evaluation and validation of machine learning models to predict hospitalizations of patients with coronary artery disease within the next 12 months