Enhancing Health Care Communication With Large Language Models

Charumathi Raghu Subramanian,Daniel A. Yang,Raman Khanna
DOI: https://doi.org/10.1001/jamanetworkopen.2024.0347
2024-03-12
JAMA Network Open
Abstract:Large language models (LLMs), a subset of artificial intelligence (AI) models trained on massive data sets that can synthesize and generate human-like text, have attracted tremendous interest in health care. 1 Researchers and health care professionals are actively exploring ways to integrate the use of LLMs for clinical administration tasks, clinical note generation, diagnostic support, and other activities. In addition to the Generative Pre-trained Transformer (GPT), there are other general purpose models, such as bidirectional encoder representations from transformers, and pathways language model, that have fine-tuned variants for medical and biomedical domains. There are also models trained exclusively on clinical text corpora, such as GatorTron and NYUTron, that have shown early promise in projecting hospital length of stay, in-hospital mortality, and hospital readmissions. 2 Effective patient communication is critical in health care, and given the significant role that written text plays in this context, text-based AI models present numerous possibilities for enhancing patient communication. 3 Organizational health literacy is the degree to which health care organizations implement strategies to make it easier for patients to understand health information, navigate the health care system, and manage their health. 4 Central to achieving organizational health literacy is the enhancement of patient-oriented written communication. Studies have demonstrated that improving the readability of such communications is positively associated with patient outcomes, 5 and LLMs could be a potential tool here. In this context, the study by Zaretsky and team 6 addresses an important task: using an LLM, GPT-4 (OpenAI), to transform hospital discharge summaries into a format that is readable for patients. Zaretsky et al 6 reviewed existing summaries and patient preferences to determine which elements should be included or excluded in the new format, processed the original summaries to retain only the specific patient-relevant elements (eg, removing billing codes), and inputted the processed summary into the LLM with instructions, modified through prompt engineering, to produce a concise, 1-page, patient-readable summary. These revised summaries were then assessed for their readability and understandability coupled with the balancing metrics of accuracy and completeness. Zaretsky et al 6 found that the LLM-generated summaries were shorter, more readable, and more understandable than the original discharge summaries. More than half of discharge summaries (54 of 100 summaries [54%]) were transformed successfully into a patient-readable format with top ratings in accuracy; yet 18 summaries were flagged for potential safety risks. Furthermore, among the 46 reviews flagged as less accurate, 24 (52%) were due to omissions and 4 (9%) were due to hallucinations. 6 These findings improve on readability scores and rates of hallucinations found in other patient education studies using LLMs. 7 ,8 Given the principle of primum non nocere , the potential safety risk that Zaretsky et al 6 found is concerning. It is perhaps understandable that omissions occurred, given strict word restrictions to fit the discharge summary into a single page. It is conceivable that a longer discharge summary could have reduced rates of omission. While not evaluated in this study, 6 it is also possible that human clinicians might also omit key clinical elements in a brief discharge summary. More concerning, despite their relative rarity, was the presence of hallucinations, particularly given the high confidence with which they were presented. 6 While the impact of LLM-generated hallucinations in health care remains unclear, hallucinations that are patient facing are more concerning than those facing clinicians, who may be able to better identify and correct them. Inaccuracies noted in this study by Zaretsky et al, 6 such as mentioning nonexistent infections or chest pain, not only pose safety risks but also threaten the trust between patients and health care practitioners. In addition to these safety concerns, the practical challenges of implementing such a tool in routine clinical practice must be considered. Manual processing of the discharge summaries to remove unnecessary elements is unlikely to be feasible on a busy clinical service, although there are technical solutions that could automate some or all of this work. More substantively, Zaretsky et al 6 underwent an extensive 6-week process of prompt engineering, and it is unclear the extent to which their work could be reusable directly or whether it would require additional reengineering at other institutions with different practice patterns and disease burdens. With that said, Zaretsky -Abstract Truncated-
medicine, general & internal
What problem does this paper attempt to address?