Validation of an AI-Powered Automated X-ray Bone Age Analyzer in Chinese Children and Adolescents: A Comparison with the Tanner-Whitehouse 3 Method
Yan Liang,Xiaobo Chen,Rongxiu Zheng,Xinran Cheng,Zhe Su,Xiumin Wang,Hongwei Du,Min Zhu,Guimei Li,Yan Zhong,Shengquan Cheng,Baosheng Yu,Yu Yang,Ruimin Chen,Lanwei Cui,Hui Yao,Qiang Gu,Chunxiu Gong,Zhang Jun,Xiaoyan Huang,Deyun Liu,Xueqin Yan,Haiyan Wei,Yuwen Li,Huifeng Zhang,Yanjie Liu,Fengyun Wang,Gaixiu Zhang,Xin Fan,Hongmei Dai,Xiaoping Luo
DOI: https://doi.org/10.1007/s12325-024-02944-4
Abstract:Introduction: Automated bone age assessment (BAA) is of growing interest because of its accuracy and time efficiency in daily practice. In this study, we validated the clinical applicability of a commercially available artificial intelligence (AI)-powered X-ray bone age analyzer equipped with a deep learning-based automated BAA system and compared its performance with that of the Tanner-Whitehouse 3 (TW-3) method. Methods: Radiographs prospectively collected from 30 centers across various regions in China, including 900 Chinese children and adolescents, were assessed independently by six doctors (three experts and three residents) and an AI analyzer for TW3 radius, ulna, and short bones (RUS) and TW3 carpal bone age. The experts' mean estimates were accepted as the gold standard. The performance of the AI analyzer was compared with that of each resident. Results: For the estimation of TW3-RUS, the AI analyzer had a mean absolute error (MAE) of 0.48 ± 0.42. The percentage of patients with an absolute error of < 1.0 years was 86.78%. The MAE was significantly lower than that of rater 1 (0.54 ± 0.49, P = 0.0068); however, it was not significant for rater 2 (0.48 ± 0.48) or rater 3 (0.49 ± 0.46). For TW3 carpal, the AI analyzer had an MAE of 0.48 ± 0.65. The percentage of patients with an absolute error of < 1.0 years was 88.78%. The MAE was significantly lower than that of rater 2 (0.58 ± 0.67, P = 0.0018) and numerically lower for rater 1 (0.54 ± 0.64) and rater 3 (0.50 ± 0.53). These results were consistent for the subgroups according to sex, and differences between the age groups were observed. Conclusion: In this comprehensive validation study conducted in China, an AI-powered X-ray bone age analyzer showed accuracies that matched or exceeded those of doctor raters. This method may improve the efficiency of clinical routines by reducing reading time without compromising accuracy.