Data-driven, two-stage machine learning algorithm-based prediction scheme for assessing 1-year and 3-year mortality risk in chronic hemodialysis patients

Wen-Teng Lee,Yu-Wei Fang,Wei-Shan Chang,Kai-Yuan Hsiao,Ben-Chang Shia,Mingchih Chen,Ming-Hsien Tsai
DOI: https://doi.org/10.1038/s41598-023-48905-9
IF: 4.6
2023-12-06
Scientific Reports
Abstract:Life expectancy is likely to be substantially reduced in patients undergoing chronic hemodialysis (CHD). However, machine learning (ML) may predict the risk factors of mortality in patients with CHD by analyzing the serum laboratory data from regular dialysis routine. This study aimed to establish the mortality prediction model of CHD patients by adopting two-stage ML algorithm-based prediction scheme, combined with importance of risk factors identified by different ML methods. This is a retrospective, observational cohort study. We included 800 patients undergoing CHD between December 2006 and December 2012 in Shin-Kong Wu Ho-Su Memorial Hospital. This study analyzed laboratory data including 44 indicators. We used five ML methods, namely, logistic regression (LGR), decision tree (DT), random forest (RF), gradient boosting (GB), and eXtreme gradient boosting (XGB), to develop a two-stage ML algorithm-based prediction scheme and evaluate the important factors that predict CHD mortality. LGR served as a bench method. Regarding the validation and testing datasets from 1- and 3-year mortality prediction model, the RF had better accuracy and area-under-curve results among the five different ML methods. The stepwise RF model, which incorporates the most important factors of CHD mortality risk based on the average rank from DT, RF, GB, and XGB, exhibited superior predictive performance compared to LGR in predicting mortality among CHD patients over both 1-year and 3-year periods. We had developed a two-stage ML algorithm-based prediction scheme by implementing the stepwise RF that demonstrated satisfactory performance in predicting mortality in patients with CHD over 1- and 3-year periods. The findings of this study can offer valuable information to nephrologists, enhancing patient-centered decision-making and increasing awareness about risky laboratory data, particularly for patients with a high short-term mortality risk.
multidisciplinary sciences
What problem does this paper attempt to address?