Identifying Health Conditions in Older Adults in Textual Health Records Using Deep Learning-Based Natural Language Processing

Jake Lin,Tomi Korpi,Anna Kuukka,Anna-Katriina Tirkkonen,Antti Kariluoto,Juho Kaijansinkko,Maija Satamo,Hanna Pajulammi,Markus J Haapanen,Sergei Hayrynen,Eetu Pursiainen,Daneil Ciovica,Mikaela B von Bonsdorff,Juulia Jylhava
DOI: https://doi.org/10.1101/2024.10.08.24315141
2024-10-10
Abstract:Many clinically significant health conditions in older adults are underreported or only recorded in unstructured health records. These records, however, contain valuable information for patient care and prognosis. This study utilized 10.6 million free-text entries from the electronic health records of 102,525 patients aged 50 to 80 across various care settings in Finland from 2010 to 2022. A deep learning-based natural language processing model was employed to perform named entity recognition (NER) to identify falls, incontinence, loneliness, and mobility limitations from the free-text entries. The performance of the NER models was evaluated by precision, recall and F1 scores. Diagnostic codes for incontinence and falls were collected for comparisons. Cox regression models were used to assess the predictive value of the identified conditions for all-cause mortality. The NER models demonstrated excellent performance with recall, precision and F1 scores greater than 0.80 across the health conditions. Compared to diagnostic codes, NER identified greater numbers of falls (31987 vs 4090) and incontinence (7059 vs 3873) onsets and yielded greater hazard ratios for all-cause mortality: 1.31 vs 1.04 for falls and 1.99 vs 0.65 for incontinence. Deep learning-based NER models present new opportunities to identify vulnerable patients in free text health records.
Health Informatics
What problem does this paper attempt to address?
The problem this paper attempts to address is: how to effectively identify health conditions from unstructured electronic health records (EHRs) using deep learning-based natural language processing (NLP) techniques, particularly health issues such as urinary incontinence, falls, mobility limitations, and loneliness, which are often underestimated or undiagnosed in the elderly. These health conditions contain critical information in unstructured text data, which is significant for patient assessment, care, and prognosis. Specifically, the research objectives include: 1. **Identify health conditions**: Using named entity recognition (NER) techniques to identify health conditions such as urinary incontinence, falls, mobility limitations, and loneliness from unstructured electronic health records. 2. **Predict all-cause mortality**: Using the identified health conditions to predict patients' all-cause mortality and compare it with existing diagnostic codes (e.g., ICD-10) to evaluate its predictive performance. The research design and methods include: - **Data source**: Using electronic health records from public primary, secondary, tertiary, long-term, and home care in the Central Finland Welfare Service County from 2010 to 2022, covering a follow-up period of up to 12 years. - **Models and techniques**: Employing Google's Bidirectional Encoder Representations from Transformers (BERT) model, pre-trained for Finnish, for the named entity recognition task. - **Performance evaluation**: Comparing the model's performance with manually annotated results using metrics such as precision, recall, and F1 score. - **Statistical analysis**: Using Cox regression models to evaluate and compare the performance of NER and ICD-10 identified falls and urinary incontinence in predicting all-cause mortality. The main findings of the research include: - The deep learning model performed excellently in identifying falls, urinary incontinence, mobility limitations, and loneliness, with F1 scores of 0.87, 0.81, 0.85, and 0.87, respectively. - Compared to ICD-10 codes, the NER model identified more falls (31987 vs 4090) and urinary incontinence (7059 vs 3873) events. - In predicting all-cause mortality, the NER model outperformed ICD-10 codes, particularly for falls and urinary incontinence. The conclusion is that deep learning-based named entity recognition models can reliably identify health conditions from unstructured electronic health records, providing new opportunities for identifying high-risk patients and supporting clinical decision-making.