Machine learning with interpretability predict surgical site infection after posterior cervical surgery

GuanRui Ren,ZhiYang Xie,YiYang Wang,Lei Liu,PeiYang Wang,Wei Zhang,YunTao Wang,XiaoTao Wu
DOI: https://doi.org/10.21203/rs.3.rs-869697/v1
2021-01-01
Abstract:Abstract Background: Ideal tools should not only investigate risk factors, but also provide explicit auxiliary answer for whether a patient will develop surgical site infection (SSI) or not. Machine learning (ML) models have ability to carry out complicated predictive medical tasks. We intend to develop ML models to predict SSI after posterior cervical surgery and interpret the outcome. Methods: We retrospectively analyzed 235 patients who had undergone posterior cervical surgery between June 2013 to April 2019 at Zhongda Hospital Affiliated to Southeast University. We established Artificial neural networks (ANN), XGBClassifier (xgboost), KNeighborsClassifier (KNN), Decision tree classifier (decision tree), Random forest classifier (random forest) and support vector classifier (SVC). Receiver operating characteristic (ROC) curve, area under the curve (AUC) score, accuracy score, recall score, F1 score and precision score were calculated to measure models’ performance. Shapley values were calculated using SHapley Additive exPlanations (SHAP) to determine relative feature importance of xgboost model. Results: The incidence of SSI was 7.23%. With AUC of 0.9972, 0.9923, 0.9865, 0.9615, 0.9540, 0.8934, the xgboost, random forest, ANN, KNN, decision tree, SCV accurately predicted SSI. Xgboost, ANN, decision tree and random forest achieved excellent performance in testing set. Top 10 variables with high predictive contribution of xgboost including, drainage volume, body mass index (BMI), drainage duration, operation blooding, cholesterin, sex, prognostic nutritional index (PNI), albumin, hypertension, operation time. Conclusion: We had successful established ML models in individualized predicting SSI after posterior cervical surgery. Xgboost, ANN, decision tree and random forest achieved excellent performance which could provide auxiliary information for clinical decision makers. The interpretable model focuses on contribution of important features to the predictive result. It can improve the acceptance of clinicians on ML and promote ML’s application in the actual clinical work.
What problem does this paper attempt to address?