Abstract:Background: A large language model (LLM) is a machine learning model inferred from text data that captures subtle patterns of language use in context. Modern LLMs are based on neural network architectures that incorporate transformer methods. They allow the model to relate words together through attention to multiple words in a text sequence. LLMs have been shown to be highly effective for a range of tasks in natural language processing (NLP), including classification and information extraction tasks and generative applications. Objective: The aim of this adapted Delphi study was to collect researchers' opinions on how LLMs might influence health care and on the strengths, weaknesses, opportunities, and threats of LLM use in health care. Methods: We invited researchers in the fields of health informatics, nursing informatics, and medical NLP to share their opinions on LLM use in health care. We started the first round with open questions based on our strengths, weaknesses, opportunities, and threats framework. In the second and third round, the participants scored these items. Results: The first, second, and third rounds had 28, 23, and 21 participants, respectively. Almost all participants (26/28, 93% in round 1 and 20/21, 95% in round 3) were affiliated with academic institutions. Agreement was reached on 103 items related to use cases, benefits, risks, reliability, adoption aspects, and the future of LLMs in health care. Participants offered several use cases, including supporting clinical tasks, documentation tasks, and medical research and education, and agreed that LLM-based systems will act as health assistants for patient education. The agreed-upon benefits included increased efficiency in data handling and extraction, improved automation of processes, improved quality of health care services and overall health outcomes, provision of personalized care, accelerated diagnosis and treatment processes, and improved interaction between patients and health care professionals. In total, 5 risks to health care in general were identified: cybersecurity breaches, the potential for patient misinformation, ethical concerns, the likelihood of biased decision-making, and the risk associated with inaccurate communication. Overconfidence in LLM-based systems was recognized as a risk to the medical profession. The 6 agreed-upon privacy risks included the use of unregulated cloud services that compromise data security, exposure of sensitive patient data, breaches of confidentiality, fraudulent use of information, vulnerabilities in data storage and communication, and inappropriate access or use of patient data. Conclusions: Future research related to LLMs should not only focus on testing their possibilities for NLP-related tasks but also consider the workflows the models could contribute to and the requirements regarding quality, integration, and regulations needed for successful implementation in practice.

The TRIPOD-LLM Statement: A Targeted Guideline For Reporting Large Language Models Use

A Survey of Large Language Models in Medicine: Progress, Application, and Challenge

The long but necessary road to responsible use of large language models in healthcare research

Demystifying Large Language Models for Medicine: A Primer

Potential of Large Language Models in Health Care: Delphi Study

Large Language Models in Medicine: The Potentials and Pitfalls

Large Language Model Augmented Clinical Trial Screening

Large Language Models for Simplified Interventional Radiology Reports: A Comparative Analysis

Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement

Understanding and training for the impact of large language models and artificial intelligence in healthcare practice: a narrative review

GPT for RCTs? Using AI to measure adherence to reporting guidelines

A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics

Large language models for structured reporting in radiology: past, present, and future

Using large language models for safety-related table summarization in clinical study reports

Large language models reshaping molecular biology and drug development

Zero-shot learning to extract assessment criteria and medical services from the preventive healthcare guidelines using large language models

Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence

Large Language Model-Driven Evaluation of Medical Records Using MedCheckLLM

Large Language Model Prompting Techniques for Advancement in Clinical Medicine

Systematic Review of Large Language Models for Patient Care: Current Applications and Challenges

LLMD: A Large Language Model for Interpreting Longitudinal Medical Records