Accuracy of bone age assessment system based on deep learning in children with abnormal growth and development
常沙,闫东,杜霞,张玉巧,程晓光,杨洁,宋玲玲,高波,罗贤
DOI: https://doi.org/10.3760/cma.j.cn112149-20230210-00087
2023-01-01
Abstract:Objective:To explore the accuracy of artificial intelligence (AI) system based on deep learning in evaluating bone age of children with abnormal growth and development.Methods:The positive X-ray films of the left wrist of children with abnormal growth and development who were treated at the Affiliated Hospital of Guizhou Medical University from January 2020 to December 2021 were collected retrospectively. A total of 717 children were collected, including 266 males and 451 females, aged 2-18 (11±3) years. Based on Tanner Whitehouse 3 (TW 3)-RUS (radius, ulna, short bone) and TW3-Carpal (carpal bone) method, bone age was measured by 3 senior radiologists, and the mean value was taken as reference standard. The bone ages were independently evaluated by the AI system (Dr.Wise bone age prediction software) and two junior radiologists (physicians 1 and 2). The accuracy within 0.5 year, the accuracy within 1 year, the mean absolute error (MAE) and the root mean square error (RMSE) between the evaluation results and the reference standard were analyzed. Paired sample t-test was used to compare MAE between AI system and junior physicians. Intraclass correlation coefficient (ICC) was used to evaluate the consistency between AI system, junior physician and reference standard. The Bland-Altman diagram was drawn and the 95% consistency limit was calculated between AI system and reference standard. Results:For TW3-RUS bone age, compared with the reference standard, the accuracy within 0.5 year of AI system, physician 1 and physician 2 was 75.3% (540/717), 62.1% (445/717) and 66.2% (475/717), respectively. The accuracy within 1 year was 96.9% (695/717), 86.3% (619/717) and 89.1% (639/717), respectively. MAE was 0.360, 0.565 and 0.496 years, and RMSE was 0.469, 0.634 and 0.572 years, respectively. For TW3-Carpal bone age, compared with the reference standard, the accuracy within 0.5 year of AI system, physician 1 and physician 2 was 80.9% (580/717), 65.1% (467/717) and 71.7% (514/717), respectively. The accuracy within 1 year was 96.0% (688/717), 87.3% (626/717) and 90.4% (648/717), respectively. MAE was 0.330, 0.527 and 0.455 years, and RMSE was 0.458, 0.612, 0.538 years, respectively. Based on TW3-RUS and TW3-Carpal bone age, the MAE of AI system were lower than those of physician 1 and physician 2, and the differences were statistically significant ( P all<0.001). The evaluation results of AI, physician 1 and physician 2 were in good agreement with the reference standard (ICC all>0.950). The Bland-Altman analysis showed that the 95% agreement limits of AI system for assessing TW3-RUS and TW3-Carpal bone age were -0.75-1.02 years and-0.86-0.91 years, respectively. Conclusion:The accuracy of AI system in evaluating the bone age of children with abnormal growth and development is close to that of senior doctors, better than that of junior doctors, and in good agreement with senior doctors.