Abstract:Generating discharge summaries is a crucial yet time-consuming task in clinical practice, essential for conveying pertinent patient information and facilitating continuity of care. Recent advancements in large language models (LLMs) have significantly enhanced their capability in understanding and summarizing complex medical texts. This research aims to explore how LLMs can alleviate the burden of manual summarization, streamline workflow efficiencies, and support informed decision-making in healthcare settings. Clinical notes from a cohort of 1,099 lung cancer patients were utilized, with a subset of 50 patients for testing purposes, and 102 patients used for model fine-tuning. This study evaluates the performance of multiple LLMs, including GPT-3.5, GPT-4, GPT-4o, and LLaMA 3 8b, in generating discharge summaries. Evaluation metrics included token-level analysis (BLEU, ROUGE-1, ROUGE-2, ROUGE-L) and semantic similarity scores between model-generated summaries and physician-written gold standards. LLaMA 3 8b was further tested on clinical notes of varying lengths to examine the stability of its performance. The study found notable variations in summarization capabilities among LLMs. GPT-4o and fine-tuned LLaMA 3 demonstrated superior token-level evaluation metrics, while LLaMA 3 consistently produced concise summaries across different input lengths. Semantic similarity scores indicated GPT-4o and LLaMA 3 as leading models in capturing clinical relevance. This study contributes insights into the efficacy of LLMs for generating discharge summaries, highlighting LLaMA 3's robust performance in maintaining clarity and relevance across varying clinical contexts. These findings underscore the potential of automated summarization tools to enhance documentation precision and efficiency, ultimately improving patient care and operational capability in healthcare settings.

Adapting Large Language Models for Automated Summarisation of Electronic Medical Records in Clinical Coding

Adapted large language models can outperform medical experts in clinical text summarization

Harmonising the Clinical Melody: Tuning Large Language Models for Hospital Course Summarisation in Clinical Coding

Sexual hormone fluctuation in chinchillas.

A Dataset and Benchmark for Hospital Course Summarization with Adapted Large Language Models

Improving Clinical Expertise in Large Language Models Using Electronic Medical Records

Enhanced Electronic Health Records Text Summarization Using Large Language Models

Evaluation of Large Language Models for Summarization Tasks in the Medical Domain: A Narrative Review

Critical Care Studies Using Large Language Models Based on Electronic Healthcare Records: A Technical Note

A Comparative Study of Recent Large Language Models on Generating Hospital Discharge Summaries for Lung Cancer Patients

Patient Centric Summarization of Radiology Findings using Large Language Models

RadAdapt: Radiology Report Summarization via Lightweight Domain Adaptation of Large Language Models

Can Large Language Models Replace Data Scientists in Clinical Research?

How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling

Evaluation of large language models performance against humans for summarizing MRI knee radiology reports: A feasibility study

Synoptic Reporting by Summarizing Cancer Pathology Reports using Large Language Models

Biomedical Large Languages Models Seem not to be Superior to Generalist Models on Unseen Medical Data

A Framework to Assess Clinical Safety and Hallucination Rates of LLMs for Medical Text Summarisation

The Sound of Healthcare: Improving Medical Transcription ASR Accuracy with Large Language Models

Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data