Abstract:Background: Despite careful patient selection and improved supportive care over recent years, chemotherapy for acute myeloid leukemia (AML) is still associated with a significant risk for treatment-related mortality (5-20%), resulting from an interplay of multiple patient and disease-related factors. Tools to better identify candidates suitable for intensive chemotherapy are much needed. In this analysis, using a large administrative database, we evaluate the potential of machine learning (ML) algorithms trained using factors available at the time of admission for AML therapy to predict death during the hospitalization. Methods: We utilized the State Inpatient Database (SID) for years 2008-2014, which holds one of the largest collections of inpatient discharges incorporating all payers from community hospitals in the United States and is part of Healthcare Cost and Utilization Project (HCUP). Data from following states was obtained for analysis: Arizona, Florida, New York, Maryland, Washington, and New Jersey. Our cohort included adult (age >17) patients diagnosed with AML (ICD-9 codes 205.XX, 206.XX, and 207.XX) and receipt of any type of chemotherapy on same admission confirmed by ICD-9 diagnosis code (V581, V5811, and V5812) or ICD-9 procedure code of 9925. The primary objective was to predict inpatient mortality in AML patients undergoing chemotherapy using covariates that were present prior to chemotherapy initiation. Features included age, race, emergency room use, year of admission, number of days from admission to chemotherapy, comorbid conditions present at the time of admission, and procedures performed on or before the day of administration of chemotherapy. The main cohort was split into training (80%) and test (20%) sets. We compared several supervised machine learning classification algorithms including logistic regression (LR), decision trees (DT) and random forests (RF). Algorithms were trained using 5-fold cross validation with hyperparameters selected via grid search to prevent overfitting. Model performance on the test set was accessed using area under the receiver operating characteristic curve (AUC, ROC). True positive rate (TPR), true negative rate (TNR) and positive predictive value (PPV) were assessed at multiple thresholds. SAS 9.4 and Python libraries were utilized for all analysis. Results: A total of 29613 subjects with AML were included in final analysis each associated with 4177 features after including indicators to capture missing categorical variables. Median age was 58.9(18-101) years. 13689 (53.7%) were males and 20203 (69%) were Caucasian. Each subject underwent some form of chemotherapy. Mean time from admission to starting chemotherapy was 3 days (95%CI, 2.9-3.1). Among all subjects, 2682 (9.1%) died during the hospitalization following chemotherapy administration. Figure 1 shows the ROC curve comparing all algorithms. Both LR and RF achieved an AUC score of 0.78 while DT achieved 0.70 AUC. In comparison, a baseline LR model with age as the sole predictor yielded 0.62 AUC. Table 1 provides TPR, TNR and PPV for each algorithm at varying decision boundary thresholds. Discussion: The strength of this machine learning approach is the applicability of using readily-accessible personalized variables to predict inpatient mortality of any patient on track for chemotherapy to treat AML, without incorporating performance status or any laboratory information. Using a threshold of 0.7, our trained RF model achieves TNR of 99.2%, TPR of 8.6 % and PPV of 57.3%. If this threshold is used to select patients suitable for chemotherapy, 51 out of 587 total deaths that occurred in our test set of 5923 could have avoided treatment related mortality while 38 would not have received chemotherapy as they will be falsely flagged. Our study supports the use of supervised machine learning algorithms on large administrative databases to create healthcare solutions. One limitation is that this dataset is not able to differentiate discharges following induction therapy in newly diagnosed patients from those following consolidation. Next steps would be to validate on larger cohorts with more detailed therapy information. Ultimately, estimating inpatient mortality at the time of hospitalization may prove useful in helping clinicians identify high-risk patients for whom alternative treatment options would have better outcomes than chemotherapy. Disclosures No relevant conflicts of interest to declare.

Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia

Identification of Potential Novel Prognosis-Related Genes Through Transcriptome Sequencing, Bioinformatics Analysis, and Clinical Validation in Acute Myeloid Leukemia

Identification of hub genes and potential molecular mechanisms related to drug sensitivity in acute myeloid leukemia based on machine learning

Identification and Validation of a Prognostic Prediction Model in Diffuse Large B-Cell Lymphoma

Machine Learning‐Based Stem Cell‐Like Phenotype Identification and Novel Risk Stratification in Diffuse Large B‐Cell Lymphoma: Multi‐Omics Data from Multicenter Studies

Discovery of Distinct Cancer Cachexia Phenotypes Using Unsupervised Machine Learning Algorithm

Survival prediction in acute myeloid leukemia using gene expression profiling

Machine Learning-Based Integrated Analysis of PANoptosis Patterns in Acute Myeloid Leukemia Reveals a Signature Predicting Survival and Immunotherapy

Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning

Machine learning and integrative multi-omics network analysis for survival prediction in acute myeloid leukemia

Survival trend and outcome prediction for pediatric Hodgkin and non-Hodgkin lymphomas based on machine learning

Iterated cross validation method for prediction of survival in diffuse large B-cell lymphoma for small size dataset

Abstract P2-11-11: Computer analysis of nuclear morphology with Multiple Instance Learning Predicts Overall Survival for Node Positive Breast Cancer Patients from SWOG S8814: A Blinded Validation Study

Improved personalized survival prediction of patients with diffuse large B-cell Lymphoma using gene expression profiling

Topological Structures in the Space of Treatment-Naïve Patients with Chronic Lymphocytic Leukemia

Comparison of Cox regression and generalized Cox regression models to machine learning in predicting survival of children with diffuse large B-cell lymphoma

Application of unsupervised analysis techniques to lung cancer patient data

Laboratory blood parameters and machine learning for the prognosis of esophageal squamous cell carcinoma

Supervised Machine Learning Algorithms Using Patient Related Factors to Predict in-Hospital Mortality Following Acute Myeloid Leukemia Therapy

A Novel Machine Learning Algorithm Combined With Multivariate Analysis for the Prognosis of Renal Collecting Duct Carcinoma

Development and validation of machine learning models for predicting prognosis and guiding individualized postoperative chemotherapy: A real-world study of distal cholangiocarcinoma