Abstract:e13620 Background: Accurate assessment of treatment response is fundamental to advancing cancer patient care, particularly in lung cancer, where treatment modalities are diverse and complex. The extraction of treatment response, particularly disease progression information, from Electronic Health Records (EHRs) is essential for several reasons: It facilitates generation of real-world evidence from real-world data, enables personalized treatment planning, and contributes to a broader understanding of cancer therapeutics. However, much of treatment response information is documented in free-text EHR notes and traditional methods of extracting such information from notes are labor-intensive, error-prone, and inefficient, presenting significant barriers to timely and accurate real-world evidence studies. The advancement of Natural Language Processing (NLP) technologies, especially Large Language Models (LLMs), presents a transformative opportunity to automate the extraction of treatment responses. Methods: We used a cohort of 1,953 primary lung cancer patients identified from UPMC Hillman Cancer Center cancer registry and retrieved ~113,000 clinical notes from UPMC clinical data warehouse. We focused on extracting disease progression information, following the RECIST guidelines which define progression as an increase in tumor size or cancer markers after therapy. We selected ~50 notes to perform manual annotations by an experienced oncologist and created a gold standard dataset to validate the NLP models. We fine-tuned a state-of-the-art open-source LLM named LLAMA-2 and compared against a traditional rule-based NLP system for the automated extraction of disease progression from notes. The process of fine-tuning involved adjusting the model's parameters specifically to improve its ability to recognize and classify instances of disease progression. Results: Our analysis demonstrated a significant enhancement in performance with the LLM compared to the traditional rule-based NLP system. LLM exhibited a remarkable increase in sensitivity by ~37%, indicating its superior ability to accurately identify disease progression. Additionally, the model maintained high specificity and PPV, achieving scores nearly comparable to the rule-based system but with a notable improvement in the F1-score by ~14%. Conclusions: Our research highlights the transformative potential of LLM-based NLP algorithms in automating the extraction of treatment responses from free-text EHRs. This methodology not only provides a scalable and efficient mechanism for processing large volumes of clinical text but also significantly enhances the accuracy of lung cancer treatment response assessments. [Table: see text]

Development of a privacy preserving large language model for automated data extraction from thyroid cancer pathology reports

From Text to Tables: A Local Privacy Preserving Large Language Model for Structured Information Retrieval from Medical Documents

Privacy-preserving large language models for structured medical information retrieval

Large language models for extracting histopathologic diagnoses from electronic health records

Critical Care Studies Using Large Language Models Based on Electronic Healthcare Records: A Technical Note

Large language model answers medical questions about standard pathology reports

Local large language models for privacy-preserving accelerated review of historic echocardiogram reports

A Survey of Clinicians’ Views of the Utility of Large Language Models

A survey analysis of the adoption of large language models among pathologists

The problem of responses less than the reporting limit in unsupervised pattern recognition.

Developing prompts from large language model for extracting clinical information from pathology and ultrasound reports in breast cancer

Extraction and classification of structured data from unstructured hepatobiliary pathology reports using large language models: a feasibility study compared with rules-based natural language processing

Large language models encode clinical knowledge

Large Language Models for Efficient Medical Information Extraction

Leveraging Large Language Models for Medical Information Extraction and Query Generation

Applications of Large Language Models (LLMs) in Breast Cancer Care

Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports

Transformative potential of Large Language Models in data mining on Electronic Health Records.

Automating the detection of treatment progression in patients with lung cancer using large language models.

Enhancing Clinical Data Extraction from Pathology Reports: A Comparative Analysis of Large Language Models

An Entity Extraction Pipeline for Medical Text Records Using Large Language Models: Analytical Study