Automated stratification of trauma injury severity across multiple body regions using multi-modal, multi-class machine learning models

Jifan Gao,Guanhua Chen,Ann P. O’Rourke,John Caskey,Kyle Carey,Madeline Oguss,Anne Stey,Dmitriy Dligach,Timothy Miller,Anoop Mayampurath,Matthew M. Churpek,Majid Afshar
DOI: https://doi.org/10.1101/2024.01.22.24301489
2024-01-22
Abstract:The timely stratification of trauma injury severity can enhance the quality of trauma care but it requires intense manual annotation from certified trauma coders. There is a need to establish an automated tool to identify the severity of trauma injuries across various body regions. We gather trauma registry data from a Level I Trauma Center at the University of Wisconsin-Madison (UW Health) between 2015 and 2019. Our study utilizes clinical documents and structured electronic health records (EHR) variables linked with the trauma registry data to create two machine learning models with different approaches to representing text. The first one fuses concept unique identifiers (CUIs) extracted from free text with structured EHR variables, while the second one integrates free text with structured EHR variables. Both models demonstrate impressive performance in categorizing leg injuries, achieving high accuracy with macro-F1 scores of around 0.8. Additionally, they show considerable accuracy, with macro- F1 scores exceeding 0.6, in assessing injuries in the areas of the chest and head. Temporal validation is conducted to ensure the models’ temporal generalizability. We show in our variable importance analysis that the most important features in the model have strong face validity in determining clinically relevant trauma injuries.
Health Informatics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: by developing multi - modal and multi - category machine learning models, automatically stratify the severity of trauma injuries, in order to reduce the need for manual annotation by certified trauma coders and improve the quality of trauma care. Specifically, the research aims to utilize clinical documents and structured electronic health record (EHR) variables to create automated tools that can accurately predict the severity of trauma in multiple body regions, especially perform well in trauma assessment of multiple key parts such as legs, chests, and heads. ### Background and Motivation Trauma is the leading cause of death among people under 45 years old, resulting in more than 3.5 million hospitalizations in the United States every year. Trauma registry systems play a crucial role in improving trauma care and its related clinical outcomes because they can clarify injury patterns and identify areas for improvement. However, early assessment of trauma severity requires certified trauma coders to use software tools to analyze EHR, which is a time - consuming and labor - intensive process. In addition, trauma scores are usually recorded after the patient is discharged, limiting their usefulness during the patient's active treatment. Therefore, developing solutions that can automatically stratify trauma scores during the course of care is of great significance, which can not only achieve more comprehensive and timely data capture but also enhance the scalability of different centers. ### Research Methods The research team collected trauma registry data from 2015 to 2019 at the Level 1 Trauma Center at the University of Wisconsin - Madison. They developed two machine learning models to handle the following two text representation methods respectively: 1. **CUIs + Structured EHR Model**: Merge the Concept Unique Identifiers (CUIs) extracted from free - text with structured EHR variables. 2. **Free - text + Structured EHR Model**: Directly process free - text and merge it with structured EHR variables. ### Model Architecture - **Text Encoding**: - **CUIs + Structured EHR Model**: Use a one - dimensional convolutional neural network (1D CNN) to encode CUIs. - **Free - text + Structured EHR Model**: Use fine - tuned ClinicalBERT to encode free - text. - **Structured EHR Encoding**: Use a pre - trained fully - connected neural network to encode structured EHR variables. - **Multi - task Neural Network**: Pre - train a multi - task neural network for a binary classification task and use its shared layer as an encoder for structured EHR data. - **Fusion and Prediction**: After fusing CUI and free - text embeddings with structured EHR data, perform multi - category prediction through a multi - layer perceptron (MLP). ### Main Results - **Performance Evaluation**: Both models perform well in leg trauma stratification, with a macro - F1 score close to 0.8; in the evaluation of the chest, abdomen and spine (chest abdspine) and the head, face and neck (head faceneck) regions, the macro - F1 score also exceeds 0.6. - **Contribution Analysis**: Structured EHR data has the greatest contribution in the arm and extremities (arm ext) regions, especially in the CUIs + Structured EHR model. ### Conclusion This research has successfully developed two multi - modal machine learning models that can automatically stratify the severity of trauma injuries in multiple body regions. These models not only improve the accuracy and efficiency of trauma assessment but also provide strong support for future automated trauma care.