Filling the gaps: leveraging large language models for temporal harmonization of clinical text across multiple medical visits for clinical prediction

Inyoung Choi,Qi Long,Emily Getzen

DOI: https://doi.org/10.1101/2024.05.06.24306959

2024-05-07

Abstract:Electronic health records offer great promise for early disease detection, treatment evaluation, information discovery, and other important facets of precision health. Clinical notes, in particular, may contain nuanced information about a patient’s condition, treatment plans, and history that structured data may not capture. As a result, and with advancements in natural language processing, clinical notes have been increasingly used in supervised prediction models. To predict long-term outcomes such as chronic disease and mortality, it is often advantageous to leverage data occurring at multiple time points in a patient’s history. However, these data are often collected at irregular time intervals and varying frequencies, thus posing an analytical challenge. Here, we propose the use of large language models (LLMs) for robust temporal harmonization of clinical notes across multiple visits. We compare multiple state-of-the-art LLMs in their ability to generate useful information during time gaps, and evaluate performance in supervised deep learning models for clinical prediction.

Intensive Care and Critical Care Medicine

What problem does this paper attempt to address?

This paper mainly discusses how to use large language models (LLMs) to address the irregularity of time series data in electronic health records (EHRs) and improve the accuracy of clinical predictions. The researchers noticed that due to inconsistent patient visit intervals, EHR data poses analytical challenges that may result in inaccurate predictions by machine learning models for long-term outcomes. To tackle this issue, they propose utilizing LLMs to generate useful information within the time intervals, thereby enhancing the temporal structure of clinical notes. The paper introduces several traditional approaches for handling irregular time series data, such as zero filling, last observation carried forward (LOCF), and multimodal imputation. Then, they suggest using LLMs, particularly those specifically trained on biological and clinical data, to generate missing doctor's note text and fill in the time intervals. By feeding the enhanced temporal structure into a supervised deep learning model, the authors predict the mortality rate of intensive care unit/emergency department patients within a year and compare it with existing methods. The experimental results show that GPT-4 (an advanced LLM) performs the best in terms of AUC and F1 scores in both zero-shot learning and one-shot learning settings compared to other methods (including multimodal imputation and LOCF). Specifically, for patients with a large amount of missing data, filling the gaps with GPT-4 significantly improves model performance. Furthermore, the study also finds that GPT-4 can enhance algorithm fairness for patient populations with different data completeness, as it can strengthen the EHR of patients with incomplete data. The paper concludes by discussing the potential limitations of LLMs, such as inadequate interpretability and possible "hallucination" outputs, and suggests that future research should focus on improving the interpretability of LLMs in the medical field and reducing erroneous predictions.

Filling the gaps: leveraging large language models for temporal harmonization of clinical text across multiple medical visits for clinical prediction

Improving Clinical Expertise in Large Language Models Using Electronic Medical Records

Critical Care Studies Using Large Language Models Based on Electronic Healthcare Records: A Technical Note

How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling

Prompting Large Language Models for Zero-Shot Clinical Prediction with Structured Longitudinal Electronic Health Record Data

Enhancing Early Detection of Cognitive Decline in the Elderly: A Comparative Study Utilizing Large Language Models in Clinical Notes

Large Language Multimodal Models for 5-Year Chronic Disease Cohort Prediction Using EHR Data

Large Language Models for Social Determinants of Health Information Extraction from Clinical Notes - A Generalizable Approach across Institutions

Scalable information extraction from free text electronic health records using large language models

Towards Maps of Disease Progression: Biomedical Large Language Model Latent Spaces For Representing Disease Phenotypes And Pseudotime

Health system-scale language models are all-purpose prediction engines

MedTsLLM: Leveraging LLMs for Multimodal Medical Time Series Analysis

How to Leverage Multimodal EHR Data for Better Medical Predictions?

LLMD: A Large Language Model for Interpreting Longitudinal Medical Records

Clinical Risk Prediction Using Language Models: Benefits And Considerations

Two Directions for Clinical Data Generation with Large Language Models: Data-to-Label and Label-to-Data

Deep EHR: Chronic Disease Prediction Using Medical Notes

Large Language Models for Healthcare Data Augmentation: An Example on Patient-Trial Matching

Enhancing Clinical Data Extraction from Pathology Reports: A Comparative Analysis of Large Language Models

Dynamic Q&A of Clinical Documents with Large Language Models