Systematic Review of Large Language Models for Patient Care: Current Applications and Challenges

Felix Busch,Lena Hoffmann,Christopher Rueger,Elon HC van Dijk,Rawen Kader,Esteban Ortiz-Prado,Marcus R Makowski,Luca Saba,Martin Hadamitzky,Jakob Nikolas Kather,Daniel Truhn,Renato Cuocolo,Lisa C Adams,Keno K Bressem
DOI: https://doi.org/10.1101/2024.03.04.24303733
2024-03-05
Abstract:The introduction of large language models (LLMs) into clinical practice promises to improve patient education and empowerment, thereby personalizing medical care and broadening access to medical knowledge. Despite the popularity of LLMs, there is a significant gap in systematized information on their use in patient care. Therefore, this systematic review aims to synthesize current applications and limitations of LLMs in patient care using a data-driven convergent synthesis approach. We searched 5 databases for qualitative, quantitative, and mixed methods articles on LLMs in patient care published between 2022 and 2023. From 4,349 initial records, 89 studies across 29 medical specialties were included, primarily examining models based on the GPT-3.5 (53.2%, n=66 of 124 different LLMs examined per study) and GPT-4 (26.6%, n=33/124) architectures in medical question answering, followed by patient information generation, including medical text summarization or translation, and clinical documentation. Our analysis delineates two primary domains of LLM limitations: design and output. Design limitations included 6 second-order and 12 third-order codes, such as lack of medical domain optimization, data transparency, and accessibility issues, while output limitations included 9 second-order and 32 third-order codes, for example, non-reproducibility, non-comprehensiveness, incorrectness, unsafety, and bias. In conclusion, this study is the first review to systematically map LLM applications and limitations in patient care, providing a foundational framework and taxonomy for their implementation and evaluation in healthcare settings.
Health Informatics
What problem does this paper attempt to address?
This paper mainly discusses the application and challenges of large language models (LLMs) in patient care. With the popularity of LLMs like ChatGPT, they are expected to improve patient education and personalized medicine, and facilitate access to medical knowledge. However, despite the attention to the potential of LLMs in clinical practice, there is still a lack of systematic information regarding their use in patient care. The research methodology includes a systematic review of relevant studies published between 2022 and 2023, covering multiple databases and selecting articles based on predetermined criteria. The analysis focuses on the application of LLMs in disease management, support, and patient communication, as well as the associated limitations. The research found that the majority of applications are based on the GPT-3.5 and GPT-4 architectures, primarily used for medical question-answering, patient information generation (such as text summarization and translation), and clinical documentation. The limitations of LLMs primarily include design and output. Design limitations include the lack of optimization for the medical domain, issues of data transparency and accessibility, while output limitations involve issues of replicability, incompleteness, inaccuracy, insecurity, and bias. The paper provides a systematic overview of the application and limitations of LLMs in patient care, laying the foundation for their implementation and evaluation in the healthcare setting. Future research should focus on open-source LLMs to overcome the transparency and reliability issues of proprietary models, and address the accuracy of outputs and patient comprehension, to ensure the safe and effective application of LLMs in healthcare.