Predicting 30-Day and 1-Year Mortality in Heart Failure with Preserved Ejection Fraction (HFpEF)

Ikgyu Shin,Nilay Bhatt,Alaa Alashi,Keervani Kandala,Karthik Murugiah
DOI: https://doi.org/10.1101/2024.10.15.24315524
2024-10-16
Abstract:Objectives: To develop and compare prediction models for 30-day and 1-year mortality in Heart failure with preserved ejection fraction (HFpEF) using EHR data, utilizing both traditional and machine learning (ML) techniques. Background: HFpEF represents 1 in 2 heart failure patients. Predictive models in HFpEF, specifically those derived from electronic health record (EHR) data, are less established. Methods: Using MIMIC-IV EHR data from 2008-2019, patients aged ≥ 18 years admitted with a primary diagnosis of HFpEF were identified using ICD-9 and 10 codes. Demographics, vital signs, prior diagnoses, and lab data were extracted. Data was partitioned into 80% training, 20% test sets. Prediction models from seven model classes (Support Vector Classifier (SVC), Logistic Regression, Lasso Regression, Elastic Net, Random Forest, Histogram-based Gradient Boosting Classifier (HGBC), and XGBoost) were developed using various imputation and oversampling techniques with 5-fold cross-validation. Model performance was compared using several metrics, and individual feature importance assessed using SHapley Additive exPlanations (SHAP) analysis. Results: Among 3910 hospitalizations for HFpEF, 30-day mortality was 6.3%, and 1-year mortality was 29.2%. Logistic regression performed well for 30-day mortality (Area Under the Receiver operating characteristic curve (AUC) 0.83), whereas Random Forest (AUC 0.79) and HGBC (AUC 0.78) for 1-year mortality. Age and NT-proBNP were the strongest predictors in SHAP analyses for both outcomes. Conclusion: Models derived from EHR data can predict mortality after HFpEF hospitalization with comparable performance to models derived from registry or trial data, highlighting the potential for clinical implementation.
Cardiovascular Medicine
What problem does this paper attempt to address?
This paper aims to solve the problem of predicting the mortality of patients with heart failure with preserved ejection fraction (HFpEF) 30 days and 1 year after hospitalization. Specifically, the research objective is to develop and compare prediction models for 30 - day and 1 - year mortality using electronic health record (EHR) data, and use traditional statistical methods and machine - learning techniques for modeling. ### Research Background HFpEF accounts for about half of all heart failure patients, has a high hospitalization rate and poor prognosis. However, there are relatively few prediction models for HFpEF patients, especially those based on EHR data. Existing prediction models are mostly based on registry or clinical trial data, which often do not include real - time clinical indicators in EHR, limiting their application in the actual clinical environment. ### Research Methods - **Data Source**: Use EHR data in the MIMIC - IV database (2008 - 2019) to identify hospitalized patients aged ≥18 years with a primary diagnosis of HFpEF. - **Feature Extraction**: Extract demographic information, vital signs, previous diagnoses and laboratory data. - **Model Selection**: Use seven model classes (support vector classifier (SVC), logistic regression, Lasso regression, Elastic Net, random forest, histogram - based gradient boosting classifier (HGBC) and XGBoost) for modeling. - **Evaluation Metrics**: Use metrics such as accuracy, sensitivity, specificity, area under the ROC curve (AUC), area under the PR curve (PR - AUC), calibration curve, Matthews correlation coefficient (MCC), Akaike information criterion (AIC) and Bayesian information criterion (BIC) to evaluate model performance. - **Feature Importance**: Use SHapley Additive exPlanations (SHAP) analysis to evaluate the importance of each feature. ### Main Findings - **30 - Day Mortality**: The logistic regression model performed best, with an AUC of 0.83. - **1 - Year Mortality**: Random forest (AUC 0.79) and histogram - based gradient boosting classifier (AUC 0.78) performed better. - **Key Predictors**: Age and NT - proBNP levels were the most important predictors at both time points. Other important features include white blood cell count, troponin and bicarbonate levels. ### Conclusions Prediction models based on EHR data can effectively predict the 30 - day and 1 - year mortality of HFpEF patients, and their performance is comparable to that of models based on registry or clinical trial data. These models have the potential to be implemented in the clinical environment, which helps to identify high - risk patients, optimize resource allocation, and support policymakers in risk adjustment.