Empathy and clarity in GPT-4-Generated Emergency Department Discharge Letters

Gal Ben Haim,Adva Livne,Uri Manor,David hochstein,Mor Saban,orly Blaier,Yael Abramov Iram,Moran Gigi Balzam,Ariel Lutenberg,Rowand Eyade,Roula Qassem,Dan Trabelsi,Yarden Dahari,Ben Zion Eisenmann,Yelena Shechtman,Girish Nadkarni,Benjamin S Glicksberg,Eyal Zimlichman,Anat Perry,eyal klang
DOI: https://doi.org/10.1101/2024.10.07.24315034
2024-10-07
Abstract:Background and Aim: The potential of large language models (LLMs) like GPT-4 to generate clear and empathetic medical documentation is becoming increasingly relevant. This study evaluates these constructs in discharge letters generated by GPT-4 compared to those written by emergency department (ED) physicians. Methods: In this retrospective, blinded study, 72 discharge letters written by ED physicians were compared to GPT-4-generated versions, which were based on the physicians' follow-up notes in the electronic medical record (EMR). Seventeen evaluators, 7 physicians, 5 nurses, and 5 patients, were asked to select their preferred letter (human or LLM) for each patient and rate empathy, clarity, and overall quality using a 5-point Likert scale (1 = Poor, 5 = Excellent). A secondary analysis by 3 ED attending physicians assessed the medical accuracy of both sets of letters. Results: Across the 72 comparisons, evaluators preferred GPT-4-generated letters in 1,009 out of 1,206 evaluations (83.7%). GPT-4 letters were rated significantly higher for empathy, clarity, and overall quality (p < 0.001). Additionally, GPT-4-generated letters demonstrated superior medical accuracy, with a median score of 5.0 compared to 4.0 for physician-written letters (p = 0.025). Conclusion: GPT-4 shows strong potential in generating ED discharge letters that are empathetic and clear, preferable by healthcare professionals and patients, offering a promising tool to reduce the workload of ED physicians. However, further research is necessary to explore patient perceptions and best practices for leveraging the advantages of AI together with physicians in clinical practice.
Emergency Medicine
What problem does this paper attempt to address?
This paper aims to explore the differences in empathy, clarity, and overall quality between discharge letters generated by GPT-4 and those handwritten by doctors in the emergency department. Specifically, the study evaluates the performance of these letters in terms of empathy, clarity, and overall quality by comparing GPT-4 generated discharge letters with those written by emergency department doctors. Additionally, the study assesses the medical accuracy of both sets of letters and involves multiple groups of evaluators (including doctors, nurses, and patients) to determine whether AI-generated discharge letters can improve the quality of medical documentation in actual clinical settings and reduce the workload of emergency department doctors. The study results indicate that GPT-4 generated discharge letters outperform those written by doctors in several aspects, particularly in empathy, clarity, and overall quality.