Abstract:Background: A large language model (LLM) is a machine learning model inferred from text data that captures subtle patterns of language use in context. Modern LLMs are based on neural network architectures that incorporate transformer methods. They allow the model to relate words together through attention to multiple words in a text sequence. LLMs have been shown to be highly effective for a range of tasks in natural language processing (NLP), including classification and information extraction tasks and generative applications. Objective: The aim of this adapted Delphi study was to collect researchers' opinions on how LLMs might influence health care and on the strengths, weaknesses, opportunities, and threats of LLM use in health care. Methods: We invited researchers in the fields of health informatics, nursing informatics, and medical NLP to share their opinions on LLM use in health care. We started the first round with open questions based on our strengths, weaknesses, opportunities, and threats framework. In the second and third round, the participants scored these items. Results: The first, second, and third rounds had 28, 23, and 21 participants, respectively. Almost all participants (26/28, 93% in round 1 and 20/21, 95% in round 3) were affiliated with academic institutions. Agreement was reached on 103 items related to use cases, benefits, risks, reliability, adoption aspects, and the future of LLMs in health care. Participants offered several use cases, including supporting clinical tasks, documentation tasks, and medical research and education, and agreed that LLM-based systems will act as health assistants for patient education. The agreed-upon benefits included increased efficiency in data handling and extraction, improved automation of processes, improved quality of health care services and overall health outcomes, provision of personalized care, accelerated diagnosis and treatment processes, and improved interaction between patients and health care professionals. In total, 5 risks to health care in general were identified: cybersecurity breaches, the potential for patient misinformation, ethical concerns, the likelihood of biased decision-making, and the risk associated with inaccurate communication. Overconfidence in LLM-based systems was recognized as a risk to the medical profession. The 6 agreed-upon privacy risks included the use of unregulated cloud services that compromise data security, exposure of sensitive patient data, breaches of confidentiality, fraudulent use of information, vulnerabilities in data storage and communication, and inappropriate access or use of patient data. Conclusions: Future research related to LLMs should not only focus on testing their possibilities for NLP-related tasks but also consider the workflows the models could contribute to and the requirements regarding quality, integration, and regulations needed for successful implementation in practice.

Large Language Model-Driven Evaluation of Medical Records Using MedCheckLLM

Improving Clinical Expertise in Large Language Models Using Electronic Medical Records

Evaluating large language models in medical applications: a survey

Tolerance to tacrine, arterial hypotension and leuko-araiosis in Alzheimer's disease.

Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review

MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications

LLMD: A Large Language Model for Interpreting Longitudinal Medical Records

Large language models encode clinical knowledge

Large language models for accurate disease detection in electronic health records

Large language models in healthcare and medical domain: A review

An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models

Evaluation of large language model performance on the Biomedical Language Understanding and Reasoning Benchmark

Systematic Review of Large Language Models for Patient Care: Current Applications and Challenges

Large language models in medical and healthcare fields: applications, advances, and challenges

Potential of Large Language Models in Health Care: Delphi Study

Evaluating large language models for use in healthcare: A framework for translational value assessment

Large Language Models in the Medical Field: Principles and Applications

Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator

Towards Evaluating and Building Versatile Large Language Models for Medicine

Large Language Models Illuminate a Progressive Pathway to Artificial Healthcare Assistant: A Review

Evaluation and mitigation of the limitations of large language models in clinical decision-making