Development of interpretable machine learning models to predict in‐hospital prognosis of acute heart failure patients
Munekazu Tanaka,Hirohiko Kohjitani,Erika Yamamoto,Takeshi Morimoto,Takao Kato,Hidenori Yaku,Yasutaka Inuzuka,Yodo Tamaki,Neiko Ozasa,Yuta Seko,Masayuki Shiba,Yusuke Yoshikawa,Yugo Yamashita,Takeshi Kitai,Ryoji Taniguchi,Moritake Iguchi,Kazuya Nagao,Takafumi Kawai,Akihiro Komasa,Yuichi Kawase,Takashi Morinaga,Mamoru Toyofuku,Yutaka Furukawa,Kenji Ando,Kazushige Kadota,Yukihito Sato,Koichiro Kuwahara,Yasushi Okuno,Takeshi Kimura,Koh Ono,the KCHF Study Investigators
DOI: https://doi.org/10.1002/ehf2.14834
2024-05-17
ESC Heart Failure
Abstract:Aims In recent years, there has been remarkable development in machine learning (ML) models, showing a trend towards high prediction performance. ML models with high prediction performance often become structurally complex and are frequently perceived as black boxes, hindering intuitive interpretation of the prediction results. We aimed to develop ML models with high prediction performance, interpretability, and superior risk stratification to predict in‐hospital mortality and worsening heart failure (WHF) in patients with acute heart failure (AHF). Methods and results Based on the Kyoto Congestive Heart Failure registry, which enrolled 4056 patients with AHF, we developed prediction models for in‐hospital mortality and WHF using information obtained on the first day of admission (demographics, physical examination, blood test results, etc.). After excluding 16 patients who died on the first or second day of admission, the original dataset (n = 4040) was split 4:1 into training (n = 3232) and test datasets (n = 808). Based on the training dataset, we developed three types of prediction models: (i) the classification and regression trees (CART) model; (ii) the random forest (RF) model; and (iii) the extreme gradient boosting (XGBoost) model. The performance of each model was evaluated using the test dataset, based on metrics including sensitivity, specificity, area under the receiver operating characteristic curve (AUC), Brier score, and calibration slope. For the complex structure of the XGBoost model, we performed SHapley Additive exPlanations (SHAP) analysis, classifying patients into interpretable clusters. In the original dataset, the proportion of females was 44.8% (1809/4040), and the average age was 77.9 ± 12.0. The in‐hospital mortality rate was 6.3% (255/4040) and the WHF rate was 22.3% (900/4040) in the total study population. In the in‐hospital mortality prediction, the AUC for the XGBoost model was 0.816 [95% confidence interval (CI): 0.815–0.818], surpassing the AUC values for the CART model (0.683, 95% CI: 0.680–0.685) and the RF model (0.755, 95% CI: 0.753–0.757). Similarly, in the WHF prediction, the AUC for the XGBoost model was 0.766 (95% CI: 0.765–0.768), outperforming the AUC values for the CART model (0.688, 95% CI: 0.686–0.689) and the RF model (0.713, 95% CI: 0.711–0.714). In the XGBoost model, interpretable clusters were formed, and the rates of in‐hospital mortality and WHF were similar among each cluster in both the training and test datasets. Conclusions The XGBoost models with SHAP analysis provide high prediction performance, interpretability, and reproducible risk stratification for in‐hospital mortality and WHF for patients with AHF.
cardiac & cardiovascular systems