Abstract:Large language models (LLMs), such as ChatGPT, have received substantial attention due to their capabilities for understanding and generating human language. While there has been a burgeoning trend in research focusing on the employment of LLMs in supporting different medical tasks (e.g., enhancing clinical diagnostics and providing medical education), a review of these efforts, particularly their development, practical applications, and outcomes in medicine, remains scarce. Therefore, this review aims to provide a detailed overview of the development and deployment of LLMs in medicine, including the challenges and opportunities they face. In terms of development, we provide a detailed introduction to the principles of existing medical LLMs, including their basic model structures, number of parameters, and sources and scales of data used for model development. It serves as a guide for practitioners in developing medical LLMs tailored to their specific needs. In terms of deployment, we offer a comparison of the performance of different LLMs across various medical tasks, and further compare them with state-of-the-art lightweight models, aiming to provide an understanding of the advantages and limitations of LLMs in medicine. Overall, in this review, we address the following questions: 1) What are the practices for developing medical LLMs 2) How to measure the medical task performance of LLMs in a medical setting? 3) How have medical LLMs been employed in real-world practice? 4) What challenges arise from the use of medical LLMs? and 5) How to more effectively develop and deploy medical LLMs? By answering these questions, this review aims to provide insights into the opportunities for LLMs in medicine and serve as a practical resource. We also maintain a regularly updated list of practical guides on medical LLMs at:

What problem does this paper attempt to address?

The paper attempts to address the following key issues: 1. **Development status of large language models (LLMs) in the medical field**: The paper aims to comprehensively review the development history of LLMs in the medical field, including the basic architecture of these models, the scale of parameters, and the data sources and scale used for model development. This provides guidance for researchers and clinicians to design medical LLMs that meet specific needs. 2. **Performance evaluation of medical tasks**: The paper explores how to measure the performance of LLMs in medical environments, particularly in different medical tasks. By comparing with state-of-the-art lightweight models, the paper aims to clearly demonstrate the unique advantages and limitations of LLMs in the medical field. 3. **Practical applications**: The paper analyzes the application of medical LLMs in actual medical practice, including scenarios such as electronic health records (EHRs), discharge summary generation, health education, and care planning. This helps to understand the practicality and effectiveness of these models in real clinical environments. 4. **Challenges faced**: The paper discusses the main challenges in implementing LLMs in medical practice, such as generating factually inaccurate but seemingly plausible outputs (hallucinations), ethical, legal, and safety issues. The paper emphasizes the need for a comprehensive evaluation framework to ensure the reliability and effective utilization of medical LLMs. 5. **Future development directions**: The paper proposes methods to optimize the construction of medical LLMs to improve their applicability in clinical environments, ultimately contributing to medicine and creating a positive social impact. This includes promoting interdisciplinary collaboration between AI experts and medical professionals, advocating for a "doctor-in-the-loop" approach, and emphasizing human-centered design principles. By addressing these issues, the paper aims to provide profound insights into the opportunities and challenges of LLMs in the medical field and serve as a practical resource for building effective medical LLMs.

A Survey of Large Language Models in Medicine: Progress, Application, and Challenge

A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine

A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations

A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions

Evaluating large language models in medical applications: a survey

Application Research of Large Language Models in Medicine: Status, Problems, and Future

Large Language Models for Medicine: A Survey

The application of large language models in medicine: A scoping review

Demystifying Large Language Models for Medicine: A Primer

A Survey for Large Language Models in Biomedicine

Large Language Models in Medicine: The Potentials and Pitfalls

Large language models in healthcare and medical domain: A review

A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics

Large language models in medical and healthcare fields: applications, advances, and challenges

Large language models in health care: Development, applications, and challenges

A Comprehensive Survey on Evaluating Large Language Model Applications in the Medical Industry

Based on Medicine, The Now and Future of Large Language Models

Large Language Models Illuminate a Progressive Pathway to Artificial Healthcare Assistant: A Review

Systematic Review of Large Language Models for Patient Care: Current Applications and Challenges

Large language models for biomedicine: foundations, opportunities, challenges, and best practices