Automated Skeletal Bone Age Assessment with Two-Stage Convolutional Transformer Network Based on X-ray Images

Xiongwei Mao,Qinglei Hui,Siyu Zhu,Wending Du,Chenhui Qiu,Xiaoping Ouyang,Dexing Kong
DOI: https://doi.org/10.3390/diagnostics13111837
2023-05-24
Abstract:Human skeletal development is continuous and staged, and different stages have various morphological characteristics. Therefore, bone age assessment (BAA) can accurately reflect the individual's growth and development level and maturity. Clinical BAA is time consuming, highly subjective, and lacks consistency. Deep learning has made considerable progress in BAA in recent years by effectively extracting deep features. Most studies use neural networks to extract global information from input images. However, clinical radiologists are highly concerned about the ossification degree in some specific regions of the hand bones. This paper proposes a two-stage convolutional transformer network to improve the accuracy of BAA. Combined with object detection and transformer, the first stage mimics the bone age reading process of the pediatrician, extracts the hand bone region of interest (ROI) in real time using YOLOv5, and proposes hand bone posture alignment. In addition, the previous information encoding of biological sex is integrated into the feature map to replace the position token in the transformer. The second stage extracts features within the ROI by window attention, interacts between different ROIs by shifting the window attention to extract hidden feature information, and penalizes the evaluation results using a hybrid loss function to ensure its stability and accuracy. The proposed method is evaluated on the data from the Pediatric Bone Age Challenge organized by the Radiological Society of North America (RSNA). The experimental results show that the proposed method achieves a mean absolute error (MAE) of 6.22 and 4.585 months on the validation and testing sets, respectively, and the cumulative accuracy within 6 and 12 months reach 71% and 96%, respectively, which is comparable to the state of the art, markedly reducing the clinical workload and realizing rapid, automatic, and high-precision assessment.
What problem does this paper attempt to address?