A Natural Language Processing Pipeline based on the Columbia- Suicide Severity Rating Scale

Lauren A. Lepow MD, PhD,Prakash Adekkanattu PhD,Marika Cusick MS,Hilary Coon PhD,Brian Fennessy MS,Shane OConnell PhD,Charlotte Pierce MD,Jessica Rabbany MD,Mohit Sharma MPH,Mark Olfson MD,Amanda Bakian PhD,Yunyu Xiao PhD,Niamh Mullins PhD,Girish N. Nadkarni MD, MPH,Alexander W. Charney MD, PhD,Jyotishman Pathak PhD,J. John Mann MD
DOI: https://doi.org/10.1101/2024.12.19.24319352
2024-12-20
Abstract:Importance: Diagnostic codes in the Electronic Health Record (EHR) are known to be limited in reporting patient suicidality, and especially in differentiating the levels of suicide severity. Objective: The authors developed and validated a portable natural language processing (NLP) algorithm for detection of suicidal ideation (SI) and suicide-related behavior and attempts (SB/SA) in EHR data. The algorithm was then deployed, and SI and SB/SA ascertainment was compared to that of International Statistical Classification of Diseases (ICD-9 and 10) diagnostic codes. Design: A group of experts designed the pipeline to detect and distinguish suicide severity based on the Columbia-Suicide Severity Rating Scale (C-SSRS). Notes were manually annotated to create the Gold Standard with which the algorithm output was evaluated for accuracy. Setting: The algorithm was developed at two academic medical centers, Weill Cornell Medicine (WCM), the Mount Sinai Health System (MSHS), and tested at these two, plus a third, the University of Utah Healthcare Center (UUHSC). Participants: Notes were from participants with psychiatric encounters at the three institutions. Main Outcomes: The two main outcomes were the accuracy scores of the NLP pipeline and comparison of ascertainment rates to ICD codes. Results: F1 accuracy scores ranged from 0.86-0.97 at the three sites. The NLP rate of detection of SB/SA was almost 30 times higher, and SI was almost 10 times higher, when compared with that of diagnostic codes. NLP detected almost all cases detected by diagnostic codes. No bias in performance was found for race/ethnicity and performance was comparable in psychiatric and non-psychiatric EHRs. Conclusions and Relevance: EHRs from cohorts with psychiatric diagnoses or encounters at WCM, MSHS, and UUHSC had SI and SB/SA extracted using an NLP algorithm based on parameters defined by the C-SSRS. Validity was determined by comparing the algorithm output to manual annotations of clinical notes by domain experts. NLP- detection of SI and SB/SA was compared with that of ICD codes across a range of demographic groups. Algorithm performance was also examined for bias in minoritized groups and in non-psychiatric notes.
What problem does this paper attempt to address?