Development and Internal Validation of Machine Learning to Predict Postoperative Worse Functional Status after Surgical Treatment for Thoracic Spinal Stenosis

Tun Liu,Jia Li,Huaguang Qi,Bin Guo,Songchuan Zhao,Baoping Zhang,Langbo Li,Gang Wu,Gang Wang
DOI: https://doi.org/10.12659/MSM.945310
2024-09-26
Abstract:BACKGROUND The objective of this study was to develop and validate machine learning (ML) algorithms to predict the 30-day and 6-month risk of deteriorating functional status following surgical treatment for thoracic spinal stenosis (TSS). We aimed to provide surgeons with tools to identify patients with TSS who have a higher risk of postoperative functional decline. MATERIAL AND METHODS The records of 327 patients with TSS who completed both follow-up visits were analyzed. Our primary endpoint was the dichotomized change in the perioperative Japanese Orthopedic Association (JOA) score, categorized based on whether it deteriorated or not. The models were developed using Naïve Bays, LightGBM, XGBoost, logistic regression, and random forest classification models. The model performance was assessed by accuracy and the c-statistic. ML algorithms were trained, optimized, and tested. RESULTS The best-performing algorithms for predicting functional decline at 30 days and 6 months after TSS surgery were XGBoost (accuracy=88.17%, c-statistic=0.83) and Naïve Bays (accuracy=86.03%, c-statistic=0.80). Both algorithms presented good calibration and discrimination in our testing data. We identified several significant predictors, including poor quality of intraoperative SSEP/MEP baseline, poor quality of preoperative SSEP, duration of symptoms, operated level, and motor dysfunction of the lower extremity. CONCLUSIONS The best-performing algorithms for predicting functional decline at 30 days and 6 months after TSS surgery were XGBoost (accuracy=88.17%, c-statistic=0.83) and Naïve Bays (accuracy=86.03%, c-statistic=0.80). Both algorithms presented good calibration and discrimination in our testing data. We identified several significant predictors, including poor quality of intraoperative SSEP/MEP baseline, poor quality of preoperative SSEP, duration of symptoms, operated level, and motor dysfunction of the lower extremity.
What problem does this paper attempt to address?