Abstract:Large language models (LLMs), a subset of artificial intelligence (AI) models trained on massive data sets that can synthesize and generate human-like text, have attracted tremendous interest in health care. 1 Researchers and health care professionals are actively exploring ways to integrate the use of LLMs for clinical administration tasks, clinical note generation, diagnostic support, and other activities. In addition to the Generative Pre-trained Transformer (GPT), there are other general purpose models, such as bidirectional encoder representations from transformers, and pathways language model, that have fine-tuned variants for medical and biomedical domains. There are also models trained exclusively on clinical text corpora, such as GatorTron and NYUTron, that have shown early promise in projecting hospital length of stay, in-hospital mortality, and hospital readmissions. 2 Effective patient communication is critical in health care, and given the significant role that written text plays in this context, text-based AI models present numerous possibilities for enhancing patient communication. 3 Organizational health literacy is the degree to which health care organizations implement strategies to make it easier for patients to understand health information, navigate the health care system, and manage their health. 4 Central to achieving organizational health literacy is the enhancement of patient-oriented written communication. Studies have demonstrated that improving the readability of such communications is positively associated with patient outcomes, 5 and LLMs could be a potential tool here. In this context, the study by Zaretsky and team 6 addresses an important task: using an LLM, GPT-4 (OpenAI), to transform hospital discharge summaries into a format that is readable for patients. Zaretsky et al 6 reviewed existing summaries and patient preferences to determine which elements should be included or excluded in the new format, processed the original summaries to retain only the specific patient-relevant elements (eg, removing billing codes), and inputted the processed summary into the LLM with instructions, modified through prompt engineering, to produce a concise, 1-page, patient-readable summary. These revised summaries were then assessed for their readability and understandability coupled with the balancing metrics of accuracy and completeness. Zaretsky et al 6 found that the LLM-generated summaries were shorter, more readable, and more understandable than the original discharge summaries. More than half of discharge summaries (54 of 100 summaries [54%]) were transformed successfully into a patient-readable format with top ratings in accuracy; yet 18 summaries were flagged for potential safety risks. Furthermore, among the 46 reviews flagged as less accurate, 24 (52%) were due to omissions and 4 (9%) were due to hallucinations. 6 These findings improve on readability scores and rates of hallucinations found in other patient education studies using LLMs. 7 ,8 Given the principle of primum non nocere , the potential safety risk that Zaretsky et al 6 found is concerning. It is perhaps understandable that omissions occurred, given strict word restrictions to fit the discharge summary into a single page. It is conceivable that a longer discharge summary could have reduced rates of omission. While not evaluated in this study, 6 it is also possible that human clinicians might also omit key clinical elements in a brief discharge summary. More concerning, despite their relative rarity, was the presence of hallucinations, particularly given the high confidence with which they were presented. 6 While the impact of LLM-generated hallucinations in health care remains unclear, hallucinations that are patient facing are more concerning than those facing clinicians, who may be able to better identify and correct them. Inaccuracies noted in this study by Zaretsky et al, 6 such as mentioning nonexistent infections or chest pain, not only pose safety risks but also threaten the trust between patients and health care practitioners. In addition to these safety concerns, the practical challenges of implementing such a tool in routine clinical practice must be considered. Manual processing of the discharge summaries to remove unnecessary elements is unlikely to be feasible on a busy clinical service, although there are technical solutions that could automate some or all of this work. More substantively, Zaretsky et al 6 underwent an extensive 6-week process of prompt engineering, and it is unclear the extent to which their work could be reusable directly or whether it would require additional reengineering at other institutions with different practice patterns and disease burdens. With that said, Zaretsky -Abstract Truncated-

A Comparative Study of Recent Large Language Models on Generating Hospital Discharge Summaries for Lung Cancer Patients

Evaluating Large Language Models for Drafting Emergency Department Discharge Summaries

Critical Care Studies Using Large Language Models Based on Electronic Healthcare Records: A Technical Note

Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data

Enhancing Health Care Communication With Large Language Models

A pilot feasibility study comparing large language models in extracting key information from ICU patient text records from an Irish population

Generative AI for Patient-Friendly Language in Discharge Summaries

Physician- and Large Language Model-Generated Hospital Discharge Summaries: A Blinded, Comparative Quality and Safety Study

Adapted large language models can outperform medical experts in clinical text summarization

A Dataset and Benchmark for Hospital Course Summarization with Adapted Large Language Models

A Survey of Large Language Models in Medicine: Progress, Application, and Challenge

e-Health CSIRO at "Discharge Me!" 2024: Generating Discharge Summary Sections with Fine-tuned Language Models

Sexual hormone fluctuation in chinchillas.

Patient Centric Summarization of Radiology Findings using Large Language Models

Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes

Optimizing Large Language Models for Discharge Prediction: Best Practices in Leveraging Electronic Health Record Audit Logs

Adapting Large Language Models for Automated Summarisation of Electronic Medical Records in Clinical Coding

The current status of large language models in summarizing radiology report impressions

A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions

Is larger always better? Evaluating and prompting large language models for non-generative medical tasks