1307-P: Natural Language Processing of Clinical Notes to Find Diabetes Type and Onset Year in Children and Young Adults

Anthony Wong,Victor W. Zhong,Marc Rosenman
DOI: https://doi.org/10.2337/db24-1307-p
IF: 7.7
2024-01-01
Diabetes
Abstract:Studying diabetes (DM) poses challenges when relying on structured electronic health record (EHR) data. Unstructured clinical notes are underutilized but might augment the accuracy of EHR analyses. We developed and validated a rule-based Natural Language Processing (NLP) method for finding DM type and incident diagnosis year in clinical notes. In a single center, we used structured EHR data to identify a cohort age <45 with likely type 1 (T1D) or type 2 (T2D) DM in 2016-2019 records. We previously trained our NLP algorithm on 58,450 randomly selected clinical notes (1,465 patients). We required 3 distinct concepts at the sentence level to determine an incident DM diagnosis: DM, an onset attribute (e.g., “diagnosed in”), and a temporal component (e.g., 2008). We now added NLP rules to distinguish DM type (T1D, T2D, other, or no type mentioned) and enhanced the detection of onset year. The NLP for DM type uses that most found in the tripartite onset sentences. If it is not there, the NLP uses the plurality of mentions of T1D, T2D, or other across all notes. The NLP for DM onset year assigns the year with a plurality. We tested the handcrafted NLP rules against manual review in an independent set of 100 randomly selected patients from the cohort. Analysis was at the patient level. Our final test set had 22,791 clinical notes from 97 patients (3 had no notes). On manual review, 73 had T1D, 15 T2D, 5 other DM, and 4 no DM. NLP assigned DM type in 95 patients (83 via the onset sentence, plus 12 via the plurality method). The NLP type was correct in 65/73 (89%) for T1D (5 incorrect, 3 not found), 12/15 (80%) for T2D (3 incorrect), and 2/5 (40%) for Other. NLP assigned an onset year in 80 patients (79 with true DM). Manual review found an onset year in 84/93 with DM (9 had onset “a few years ago” or prevalent DM at first visit). The NLP onset year was correct in 77 (92%) of the 84 (2 incorrect, 5 no year found). NLP was beneficial in identifying DM type and onset timing. It may complement structured EHR data in DM research and surveillance. A. Wong: None. V.W. Zhong: None. M. Rosenman: None. CDC 1U18DP006693 and 1U18DP006694
What problem does this paper attempt to address?