Development of a Predictive Model to Determine Appropriate Length of Profile

David Hazel,Justin Stewart,Triana Rivera-Nichols,Jeanette Little
DOI: https://doi.org/10.1093/milmed/usae159
2024-08-19
Abstract:Introduction: Musculoskeletal injuries are one of the primary causes of Soldiers' inability to be medically ready, comprising over 80% of such causes. The electronic profile (e-Profile) is the way that musculoskeletal injuries are documented so that commanders will know the type of injury as well as the length of the time that the Soldier will need limited duty. A previous study of e-Profiles in an Army MTF Integrated Pain Management Center showed that the median length of an e-Profile was 30 days. It is in the best interest of the Army to have the Soldier out of the fight the minimum amount of time for recovery to ensure the unit readiness. The goal of this study was to utilize e-Profile data to see if a machine learning model can be developed to determine the appropriate time a Soldier needs to be on profile for a given diagnoses. Materials and methods: Institutional Review Board approval was obtained from the USAMRDC (protocol #M-10966). The initial dataset provided to the research team consisted of a single pipe delimited ("|") text file containing 2.9 million rows of e-Profile data. Linear regression, decision trees, and random forests (RFs) were evaluated to see which model would best predict the number of days needed for an e-Profile. Results: Linear regression had an adjusted R-squared of 0.165. The positive predictive value of decision trees (0-to-30-day range of e-Profiles) was 73.6%, and the negative predictive value (30-90 days) was 60.9% with an area under the receiver operating characteristic curve (AUC) of 0.694 for the model. The positive predictive value of RFs was 85.3% (for the 0-30 range), and the negative predictive value was 58.7% (for the 30-90 range) with an AUC of 0.794. An AUC that approaches 1 indicates a more accurate prediction. Conclusions: The 3 models (linear regression, decision trees, and RF) studied as part of this project did not predict the days on e-Profile with a high degree of certainty. Future research will focus on adding additional data to the e-Profile dataset in order to improve model accuracy.
What problem does this paper attempt to address?