Development and Validation of an Ensemble Learning Risk Model for Sepsis after Abdominal Surgery

Xin Shu,Yujie Li,Yiziting Zhu,Zhiyong Yang,Xiang Liu,Xiaoyan Hu,Chunyong Yang,Lei Zhao,Tao Zhu,Yuwen Chen,Bin Yi
DOI: https://doi.org/10.5114/aoms/189505
2024-01-01
Archives of Medical Science
Abstract:Though the importance raised attention, the clinical applications of methods for screening high-risk patients of sepsis after abdominal surgery were restricted. Therefore, we aimed to develop and validate models to screening high-risk patients of sepsis after abdominal surgery based on machine learning with routine variables. The whole dataset was composed of three representative academic hospitals in China and Medical Information Mart for Intensive Care IV (MIMIC-IV) database. Routine clinical variables were implemented for model development. Boruta was applied for feature selection. Afterwards, ensemble learning and other eight conventional algorithms were used for model fitting and validation based on all features and selected features. The area under curves of the receiver operating characteristic curves (ROCAUCs), sensitivity, specificity, F1 score, accuracy, Net reclassification index (NRI), integrated discrimination improvement (IDI), Decision Curve Analysis (DCA), and calibration curves were used for model evaluation. A total of 955 patients undergoing abdominal surgery were finally analyzed (sepsis:285, non-sepsis:670). After feature selection, the ensemble learning model constructed by integrating k-Nearest Neighbor (KNN) and Support Vector Machine (SVM), yielded the ROCAUC of 0.892(0.841-0.944), the accuracy of 85.0% on the test data, and the ROCAUC of 0.782(0.727-0.838), the accuracy of 68.1% on the validation data, which performed best. Albumin, ASA score, Neutrophil-lymphocyte ratio, age, and glucose were the top features associated with postoperative sepsis by KNN and SVM. We developed a new and potential generalizable model to preoperatively screening the high-risk patients of sepsis after abdominal surgery with the advantages of a representative training cohort and routine variables.
What problem does this paper attempt to address?