Artificial intelligence-based detection of paediatric appendicular skeletal fractures: performance and limitations for common fracture types and locations
Irmhild Altmann-Schneider,Christian J Kellenberger,Sarah-Maria Pistorius,Camilla Saladin,Debora Schäfer,Nidanur Arslan,Hanna L Fischer,Michelle Seiler
DOI: https://doi.org/10.1007/s00247-023-05822-3
Abstract:Background: Research into artificial intelligence (AI)-based fracture detection in children is scarce and has disregarded the detection of indirect fracture signs and dislocations. Objective: To assess the diagnostic accuracy of an existing AI-tool for the detection of fractures, indirect fracture signs, and dislocations. Materials and methods: An AI software, BoneView (Gleamer, Paris, France), was assessed for diagnostic accuracy of fracture detection using paediatric radiology consensus diagnoses as reference. Radiographs from a single emergency department were enrolled retrospectively going back from December 2021, limited to 1,000 radiographs per body part. Enrolment criteria were as follows: suspected fractures of the forearm, lower leg, or elbow; age 0-18 years; and radiographs in at least two projections. Results: Lower leg radiographs showed 607 fractures. Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were high (87.5%, 87.5%, 98.3%, 98.3%, respectively). Detection rate was low for toddler's fractures, trampoline fractures, and proximal tibial Salter-Harris-II fractures. Forearm radiographs showed 1,137 fractures. Sensitivity, specificity, PPV, and NPV were high (92.9%, 98.1%, 98.4%, 91.7%, respectively). Radial and ulnar bowing fractures were not reliably detected (one out of 11 radial bowing fractures and zero out of seven ulnar bowing fractures were correctly detected). Detection rate was low for styloid process avulsions, proximal radial buckle, and complete olecranon fractures. Elbow radiographs showed 517 fractures. Sensitivity and NPV were moderate (80.5%, 84.7%, respectively). Specificity and PPV were high (94.9%, 93.3%, respectively). For joint effusion, sensitivity, specificity, PPV, and NPV were moderate (85.1%, 85.7%, 89.5%, 80%, respectively). For elbow dislocations, sensitivity and PPV were low (65.8%, 50%, respectively). Specificity and NPV were high (97.7%, 98.8%, respectively). Conclusions: The diagnostic performance of BoneView is promising for forearm and lower leg fractures. However, improvement is mandatory before clinicians can rely solely on AI-based paediatric fracture detection using this software.