Lower-extremity fatigue fracture detection and grading based on deep learning models of radiographs

Yanping Wang,Yuexiang Li,Guang Lin,Qirui Zhang,Jing Zhong,Yan Zhang,Kai Ma,Yefeng Zheng,Guangming Lu,Zhiqiang Zhang
DOI: https://doi.org/10.1007/s00330-022-08950-w
IF: 7.034
2022-06-24
European Radiology
Abstract:ObjectivesTo identify the feasibility of deep learning–based diagnostic models for detecting and assessing lower-extremity fatigue fracture severity on plain radiographs.MethodsThis retrospective study enrolled 1151 X-ray images (tibiofibula/foot: 682/469) of fatigue fractures and 2842 X-ray images (tibiofibula/foot: 2000/842) without abnormal presentations from two clinical centers. After labeling the lesions, images in a center (tibiofibula/foot: 2539/1180) were allocated at 7:1:2 for model construction, and the remaining images from another center (tibiofibula/foot: 143/131) for external validation. A ResNet-50 and a triplet branch network were adopted to construct diagnostic models for detecting and grading. The performances of detection models were evaluated with sensitivity, specificity, and area under the receiver operating characteristic curve (AUC), while grading models were evaluated with accuracy by confusion matrix. Visual estimations by radiologists were performed for comparisons with models.ResultsFor the detection model on tibiofibula, a sensitivity of 95.4%/85.5%, a specificity of 80.1%/77.0%, and an AUC of 0.965/0.877 were achieved in the internal testing/external validation set. The detection model on foot reached a sensitivity of 96.4%/90.8%, a specificity of 76.0%/66.7%, and an AUC of 0.947/0.911. The detection models showed superior performance to the junior radiologist, comparable to the intermediate or senior radiologist. The overall accuracy of the diagnostic model was 78.5%/62.9% for tibiofibula and 74.7%/61.1% for foot in the internal testing/external validation set.ConclusionsThe deep learning–based models could be applied to the radiological diagnosis of plain radiographs for assisting in the detection and grading of fatigue fractures on tibiofibula and foot.Key Points• Fatigue fractures on radiographs are relatively difficult to detect, and apt to be misdiagnosed.• Detection and grading models based on deep learning were constructed on a large cohort of radiographs with lower-extremity fatigue fractures.• The detection model with high sensitivity would help to reduce the misdiagnosis of lower-extremity fatigue fractures.
radiology, nuclear medicine & medical imaging
What problem does this paper attempt to address?