Are AI-Generated Replies the Answer to the EHR Inbox Problem?

Lisa S. Rotenstein,Robert M. Wachter
DOI: https://doi.org/10.1001/jamanetworkopen.2024.38528
2024-10-15
JAMA Network Open
Abstract:The use of patient portals has increased markedly in recent years. According to the Office of the National Coordinator for Health Information Technology, in 2022, about 60% of individuals accessed their online medical records or patient portals, up from 25% in 2014. 1 Engagement with electronic health record (EHR)–based patient portals has multiple benefits for patients: enabling them to see their test results, update or correct their health information, access educational materials, and correspond with their care teams. However, this engagement has also had unexpected consequences for health care systems and clinicians. As the prevalence and frequency of patient portal use have increased, so too has the number of messages (EHR inbox messages) sent by patients to their care teams. Over the past few years, health care systems have realized that they have essentially enabled around-the-clock patient access to these teams without adequately preparing for the workflow, workforce, and financial implications of this access. The work associated with EHR inbox messages has disproportionately impacted primary care clinicians, who receive 5 times as many messages as their surgical clinician counterparts. 2 Unsurprisingly, patient portal use and messaging accelerated during the COVID-19 pandemic. 3 Health care systems have implemented various solutions to manage the growing volume of EHR inbox messages, 4 including hiring nonphysician clinicians, developing triage strategies, and even charging for clinician answers to messages deemed complex. More recently, the use of artificial intelligence (AI)–generated responses to patient messages have been of particular interest. Responses drafted by large language model (LLM) technology can potentially reduce the messaging burden for care teams while maintaining patient engagement. However, little evidence has been available to assess the quality of such responses and the specifics of their implementation in complex health care practices. The quality improvement study by English et al, 5 which characterizes the use of and perceptions regarding LLM-drafted replies to EHR-inbox messages across 9 clinic sites and multiple clinical role types, advances our understanding of this new AI application. Notably, the study 5 also provides insight into the iterative process used by a large academic health system to refine the AI-generated replies after rollout. The study's principal findings were that overall, a small proportion (12%) of AI-generated message drafts were used and that the perceptions of the value of the draft messages varied substantially by role type, with nurses holding more favorable views. 5 The study 5 raises several points to reflect on. First, it underscores the importance of careful adoption of LLM-based tools, accompanied by careful analysis of end-users' experiences. For example, the authors 5 describe how providing the assessment and plan from the last clinic note to the LLM improved the draft replies. Given the constant updating of LLMs and the frequency of unanticipated consequences, it will be crucial for health care systems to undertake this type of iteration, considering the possibility—perhaps the likelihood—that one size of LLM responses will not fit all situations and specialties or the needs of all clinical role types. Notably, nurses had the most positive perceptions of the AI-generated replies among care team members who trialed the draft replies and were surveyed. 5 They were significantly more likely than physicians and advanced practice clinicians to recommend the draft replies to others and to perceive that the technology allowed them to address more messages themselves and that the messages helped them stay within their scope of practice. 5 These differences in perceptions suggest that current LLM-based responses may be best suited for protocoled replies. In contrast, the LLMs used in the study seemed less helpful in addressing queries that deviated from protocols or required nuanced expertise, such as those often left to physicians to answer in a team-based care environment. Additionally, AI-generated replies could be detrimental for higher complexity queries if they include hallucinations or make suggestions that go beyond their appropriate scope of knowledge. Future studies pinpointing for whom LLM-generated replies are most valuable will be crucial as operational leaders seek to optimize the use of this technology. Finally, the results of this study 5 underscore the need for continued clinician vigilance of AI-drafted replies (and likely of many other AI-powered technologies). 6 The authors 5 provide instances in which the drafted replies represented a response more suited for a specialty clinic or recommended a nonsensical action. While prompts were update -Abstract Truncated-
medicine, general & internal
What problem does this paper attempt to address?