Online interpretable dynamic prediction models for clinically significant posthepatectomy liver failure based on machine learning algorithms: A retrospective cohort study
Yuzhan Jin,Wanxia Li,Yachen Wu,Qian Wang,Zhiqiang Xiang,Zhangtao Long,Hao Liang,Jianjun Zou,Zhu Zhu,Xiaoming Dai
DOI: https://doi.org/10.1097/JS9.0000000000001764
2024-06-18
Abstract:Background: Posthepatectomy liver failure (PHLF) is the leading cause of mortality in patients undergoing hepatectomy. However, practical models for accurately predicting the risk of PHLF are lacking. This study aimed to develop precise prediction models for clinically significant PHLF. Methods: A total of 226 patients undergoing hepatectomy at a single center were recruited. The study outcome was clinically significant PHLF. Five pre- and postoperative machine learning (ML) models were developed and compared with four clinical scores, namely, the MELD, FIB-4, ALBI, and APRI scores. The robustness of the developed ML models was internally validated using 5-fold cross-validation by calculating the average of the evaluation metrics and was externally validated on an independent temporal dataset, including the area under the curve (AUC) and the area under the precision‒recall curve (AUPRC). SHapley Additive exPlanations analysis was performed to interpret the best performance model. Results: Clinically significant PHLF was observed in 23 of 226 patients (10.2%). The variables in the preoperative model included creatinine, total bilirubin, and Child‒Pugh grade. In addition to the above factors, the extent of resection was also a key variable for the postoperative model. The pre- and postoperative artificial neural network (ANN) models exhibited excellent performance, with mean AUCs of 0.766 and 0.851, respectively, and mean AUPRC values of 0.441 and 0.645, whereas the MELD, FIB-4, ALBI, and APRI scores reached AUCs of 0.714, 0.498, 0.536 and 0.551, respectively, and AUPRC values of 0.204, 0.111, 0.128 and 0.163, respectively. In addition, the AUCs of the pre- and postoperative ANN models were 0.720 and 0.731, respectively, and the AUPRC values were 0.380 and 0.408, respectively, on the temporal dataset. Conclusion: Our online interpretable dynamic ML models outperformed common clinical scores and could function as a clinical decision support tool to identify patients at high risk of PHLF pre- and postoperatively.