Development and Evaluation of Machine Learning Models for the Detection of Emergency Department Patients with Opioid Misuse from Clinical Notes

Usman Shahid,Natalie Parde,Dale L Smith,Grayson Dickinson,Joseph Bianco,Dillon Thorpe,Madhav Hota,Majid Afshar,Niranjan S Karnik,neeraj chhabra
DOI: https://doi.org/10.1101/2024.12.11.24318875
2024-12-12
Abstract:Objectives The accurate identification of Emergency Department (ED) encounters involving opioid misuse is critical for health services, research, and surveillance. We sought to develop natural language processing (NLP)-based models for the detection of ED encounters involving opioid misuse. Methods A sample of ED encounters enriched for opioid misuse was manually annotated and clinical notes extracted. We evaluated classic machine learning (ML) methods, fine-tuning of publicly available pretrained language models, and a previously developed convolutional neural network opioid classifier for use on hospitalized patients (SMART-AI). Performance was compared to ICD-10-CM codes. Both raw text and text transformed to the United Medical Language System were evaluated. Face validity was evaluated by term feature importance. Results There were 1123 encounters used for training, validation, and testing. Of the classic ML methods, XGBoost had the highest AU_PRC (0.936), accuracy (0.887), and F1 score (0.863) which outperformed ICD-10-CM codes [accuracy 0.870; F1 0.830]. Logistic regression, support vector machine, and XGBoost models had higher AU_PRC using transformed text, while decision trees performed better using raw text. Excluding XGBoost, fine-tuned pre-trained language models outperformed classic ML methods. The best performing model was the fine-tuned SMART-AI based model with domain adaptation [AU_PRC 0.948; accuracy 0.882; F1 0.851]. Explainability analyses showed the most predictive terms were "heroin", "opioids", "alcoholic intoxication, chronic", "cocaine", "opiates", and "suboxone". Conclusions NLP-based models outperform entry of ICD-10-CM diagnosis codes for the detection of ED encounters with opioid misuse. Fine tuning with domain adaptation for pre-trained language models resulted in improved performance.
What problem does this paper attempt to address?