Development of a deep learning model for detecting lumbar vertebral fractures on CT images: An external validation
Jingyi Tian,Kexin Wang,Pengsheng Wu,Jialun Li,Xiaodong Zhang,Xiaoying Wang
DOI: https://doi.org/10.1016/j.ejrad.2024.111685
2024-08-15
Abstract:Objective: To develop and externally validate a binary classification model for lumbar vertebral body fractures based on CT images using deep learning methods. Methods: This study involved data collection from two hospitals for AI model training and external validation. In Cohort A from Hospital 1, CT images from 248 patients, comprising 1508 vertebrae, revealed that 20.9% had fractures (315 vertebrae) and 79.1% were non-fractured (1193 vertebrae). In Cohort B from Hospital 2, CT images from 148 patients, comprising 887 vertebrae, indicated that 14.8% had fractures (131 vertebrae) and 85.2% were non-fractured (756 vertebrae). The AI model for lumbar spine fractures underwent two stages: vertebral body segmentation and fracture classification. The first stage utilized a 3D V-Net convolutional deep neural network, which produced a 3D segmentation map. From this map, region of each vertebra body were extracted and then input into the second stage of the algorithm. The second stage employed a 3D ResNet convolutional deep neural network to classify each proposed region as positive (fractured) or negative (not fractured). Results: The AI model's accuracy for detecting vertebral fractures in Cohort A's training set (n = 1199), validation set (n = 157), and test set (n = 152) was 100.0 %, 96.2 %, and 97.4 %, respectively. For Cohort B (n = 148), the accuracy was 96.3 %. The area under the receiver operating characteristic curve (AUC-ROC) values for the training, validation, and test sets of Cohort A, as well as Cohort B, and their 95 % confidence intervals (CIs) were as follows: 1.000 (1.000, 1.000), 0.978 (0.944, 1.000), 0.986 (0.969, 1.000), and 0.981 (0.970, 0.992). The area under the precision-recall curve (AUC-PR) values were 1.000 (0.996, 1.000), 0.964 (0.927, 0.985), 0.907 (0.924, 0.984), and 0.890 (0.846, 0.971), respectively. According to the DeLong test, there was no significant difference in the AUC-ROC values between the test set of Cohort A and Cohort B, both for the overall data and for each specific vertebral location (all P>0.05). Conclusion: The developed model demonstrates promising diagnostic accuracy and applicability for detecting lumbar vertebral fractures.