Prediction of intra-abdominal injury using natural language processing of electronic medical record data
Giovanna Danna,Ravi Garg,Joanna Buchheit,Radha Patel,Tiannan Zhan,Alexander Ellyn,Farhan Maqbool,Linda Yala,Yuriy Moklyak,James Frydman,Abel Kho,Nan Kong,Alona Furmanchuk,Alexander Lundberg,Anne M Stey
DOI: https://doi.org/10.1016/j.surg.2024.05.042
IF: 4.348
Surgery
Abstract:Background: This study aimed to use natural language processing to predict the presence of intra-abdominal injury using unstructured data from electronic medical records. Methods: This was a random-sample retrospective observational cohort study leveraging unstructured data from injured patients taken to one of 9 acute care hospitals in an integrated health system between 2015 and 2021. Patients with International Classification of Diseases External Cause of Morbidity codes were identified. History and physical, consult, progress, and radiology report text from the first 8 hours of care were abstracted. Annotator dyads independently annotated encounters' text files to establish ground truth regarding whether intra-abdominal injury occurred. Features were extracted from text using natural language processing techniques, bag of words, and principal component analysis. We tested logistic regression, random forests, and gradient boosting machine to determine accuracy, recall, and precision of natural language processing to predict intra-abdominal injury. Results: A random sample of 7,000 patient encounters of 177,127 was annotated. Only 2,951 had sufficient information to determine whether an intra-abdominal injury was present. Among those, 84 (2.9%) had an intra-abdominal injury. The concordance between annotators was 0.989. Logistic regression of features identified with bag of words and principal component analysis had the best predictive ability, with an area under the receiver operating characteristic curve of 0.9, recall of 0.73, and precision of 0.17. Text features with greatest importance included "abdomen," "pelvis," "spleen," and "hematoma." Conclusion: Natural language processing could be a screening decision support tool, which, if paired with human clinical assessment, can maximize precision of intra-abdominal injury identification.