Using deep learning and natural language processing models to detect child physical abuse

Niti Shahi,Ashwani K Shahi,Ryan Phillips,Gabrielle Shirek,Daniel M Lindberg,Steven L Moulton
DOI: https://doi.org/10.1016/j.jpedsurg.2021.03.007
Abstract:Background: The recognition of child physical abuse can be challenging and often requires a multidisciplinary assessment. Deep learning models, based on clinical characteristics, laboratory studies, and imaging findings, were developed to facilitate unbiased identification of children who may have been abused. Methods: Level 1 pediatric trauma center registry data from 1/1/2010-1/31/2020 were queried for abused children and matched participants with non-abusive trauma. Observations were de-identified and divided into training and validation sets. Model 1 used patient demographics (age, gender, and insurance type) and clinical characteristics (vital signs, shock index pediatric age-adjusted, Glasgow Coma Score, lactate, base deficit, and international normalized ratio). Model 2 used the same features as Model 1, but with the text of the radiology reports of head computed tomography, brain MRIs, and skeletal surveys. Google's latest BERT Natural Language Processing (NLP) model, which was pre-trained on a large corpus, was used for fine-tuning Model 2. Accuracy, sensitivity, specificity, F1 scores, and positive predictive values were used to assess performance. Results: Of 1,312 patients, 737 (56.2%) were abused. Model 1 had an accuracy of 86.3%, sensitivity of 87.2%, specificity of 85.1%, F1 score of 0.86, and positive predictive value (PPV) of 88.7% for the validation set with an area under the receiver Operating Curve (ROC AUC) of 0.86. NLP based Model 2 had an accuracy of 93.4%, sensitivity 92.5%, specificity of 94.6%, F1 score of 0.93, and PPV of 95.9% for the validation set, with a ROC AUC of 0.94. Most features had weak individual correlations with abuse (r < 0.3). Conclusions: Deep learning models accurately distinguished child physical abuse from non-abuse, and NLP further improved the accuracy of the models. Such models could be developed to run in real-time in the electronic medical record and alert clinicians when certain criteria are met, which would prompt them to pursue the diagnosis of abuse. Level of evidence: III STUDY TYPE: Diagnostic.
What problem does this paper attempt to address?