Abstract:Generating discharge summaries is a crucial yet time-consuming task in clinical practice, essential for conveying pertinent patient information and facilitating continuity of care. Recent advancements in large language models (LLMs) have significantly enhanced their capability in understanding and summarizing complex medical texts. This research aims to explore how LLMs can alleviate the burden of manual summarization, streamline workflow efficiencies, and support informed decision-making in healthcare settings. Clinical notes from a cohort of 1,099 lung cancer patients were utilized, with a subset of 50 patients for testing purposes, and 102 patients used for model fine-tuning. This study evaluates the performance of multiple LLMs, including GPT-3.5, GPT-4, GPT-4o, and LLaMA 3 8b, in generating discharge summaries. Evaluation metrics included token-level analysis (BLEU, ROUGE-1, ROUGE-2, ROUGE-L) and semantic similarity scores between model-generated summaries and physician-written gold standards. LLaMA 3 8b was further tested on clinical notes of varying lengths to examine the stability of its performance. The study found notable variations in summarization capabilities among LLMs. GPT-4o and fine-tuned LLaMA 3 demonstrated superior token-level evaluation metrics, while LLaMA 3 consistently produced concise summaries across different input lengths. Semantic similarity scores indicated GPT-4o and LLaMA 3 as leading models in capturing clinical relevance. This study contributes insights into the efficacy of LLMs for generating discharge summaries, highlighting LLaMA 3's robust performance in maintaining clarity and relevance across varying clinical contexts. These findings underscore the potential of automated summarization tools to enhance documentation precision and efficiency, ultimately improving patient care and operational capability in healthcare settings.

Automation of Trainable Datasets Generation for Medical-Specific Language Model: Using MIMIC-IV Discharge Notes

A New Pipeline For Generating Instruction Dataset via RAG and Self Fine-Tuning

WisPerMed at "Discharge Me!": Advancing Text Generation in Healthcare with Large Language Models, Dynamic Expert Selection, and Priming Techniques on MIMIC-IV

Towards Understanding the Generalization of Medical Text-to-SQL Models and Datasets

e-Health CSIRO at "Discharge Me!" 2024: Generating Discharge Summary Sections with Fine-tuned Language Models

Annotated dataset creation through large language models for non-english medical NLP

Constructing synthetic datasets with generative artificial intelligence to train large language models to classify acute renal failure from clinical notes

Enhancing Clinical Efficiency through LLM: Discharge Note Generation for Cardiac Patients

Annotated Dataset Creation through General Purpose Language Models for non-English Medical NLP

LCD Benchmark: Long Clinical Document Benchmark on Mortality Prediction for Language Models

A Dataset and Benchmark for Hospital Course Summarization with Adapted Large Language Models

Natural language processing of MIMIC-III clinical notes for identifying diagnosis and procedures with neural networks

CLIP: A Dataset for Extracting Action Items for Physicians from Hospital Discharge Notes

Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports

Generation of Synthetic Electronic Medical Record Text

Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study

A Comparative Study of Recent Large Language Models on Generating Hospital Discharge Summaries for Lung Cancer Patients

A Method for Generating Synthetic Electronic Medical Record Text

Harmonising the Clinical Melody: Tuning Large Language Models for Hospital Course Summarisation in Clinical Coding

Large Language Model Benchmarks in Medical Tasks

Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach