Abstract:Importance: The use of large language models (LLMs) in medicine is increasing, with potential applications in electronic health records (EHR) to create patient cohorts or identify patients who meet clinical trial recruitment criteria. However, significant barriers remain, including the extensive computer resources required, lack of performance evaluation, and challenges in implementation. Objective: This study aims to propose and test a framework to detect disease diagnosis using a recent light LLM on French-language EHR documents. Specifically, it focuses on detecting gout ( in French), a ubiquitous French term that have multiple meanings beyond the disease. The study will compare the performance of the LLM-based framework with traditional natural language processing techniques and test its dependence on the parameter used. Design: The framework was developed using a training and testing set of 700 paragraphs assessing , issued from a random selection of retrospective EHR documents. All paragraphs were manually reviewed and classified by two health-care professionals (HCP) into disease (true gout) and non-disease (gold standard). The LLM's accuracy was tested using few-shot and chain-of-thought prompting and compared to a regular expression (regex)-based method, focusing on the effects of model parameters and prompt structure. The framework was further validated on 600 paragraphs assessing . Setting: The documents were sampled from the electronic health-records of a tertiary university hospital in Geneva, Switzerland. Participants: Adults over 18 years of age. Exposure: Meta's Llama 3 8B LLM or traditional method, against a gold standard. Main Outcomes and Measures: Positive and negative predictive value, as well as accuracy of tested models. Results: The LLM-based algorithm outperformed the regex method, achieving a 92.7% [88.7-95.4%] positive predictive value, a 96.6% [94.6-97.8%] negative predictive value, and an accuracy of 95.4% [93.6-96.7%] for gout. In the validation set on CPPD, accuracy was 94.1% [90.2-97.6%]. The LLM framework performed well over a wide range of parameter values. Conclusions and Relevance: LLMs were able to accurately detect disease diagnoses from EHRs, even in non-English languages. They could facilitate creating large disease registries in any language, improving disease care assessment and patient recruitment for clinical trials.

Exploring Offline Large Language Models for Clinical Information Extraction: A Study of Renal Histopathological Reports of Lupus Nephritis Patients

From Text to Tables: A Local Privacy Preserving Large Language Model for Structured Information Retrieval from Medical Documents

Privacy-preserving large language models for structured medical information retrieval

Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports

Enhancing Clinical Data Extraction from Pathology Reports: A Comparative Analysis of Large Language Models

Large language models for extracting histopathologic diagnoses from electronic health records

Development of a privacy preserving large language model for automated data extraction from thyroid cancer pathology reports

Information Extraction from Clinical Notes: Are We Ready to Switch to Large Language Models?

Human-level information extraction from clinical reports with fine-tuned language models

Large Language Models Struggle in Token-Level Clinical Named Entity Recognition

Scalable information extraction from free text electronic health records using large language models

Retrieval-augmented large language models for clinical trial screening.

Leveraging Large Language Models for Medical Information Extraction and Query Generation

LLMs Accelerate Annotation for Medical Information Extraction

Large language models for accurate disease detection in electronic health records

Leveraging Prompt-Learning for Structured Information Extraction from Crohn's Disease Radiology Reports in a Low-Resource Language

Optimal strategies for adapting open-source large language models for clinical information extraction: a benchmarking study in the context of ulcerative colitis research

Optimizing Data Extraction: Harnessing RAG and LLMs for German Medical Documents

Large language models can effectively extract stroke and reperfusion audit data from medical free-text discharge summaries

The problem of responses less than the reporting limit in unsupervised pattern recognition.

An Entity Extraction Pipeline for Medical Text Records Using Large Language Models: Analytical Study