Abstract:Introduction: In the past year, the use of large language models (LLMs) has generated significant interest and excitement because of their potential to revolutionise various fields, including medical education for aspiring physicians. Although medical students undergo a demanding educational process to become competent health care professionals, the emergence of LLMs presents a promising solution to challenges like information overload, time constraints and pressure on clinical educators. However, integrating LLMs into medical education raises critical concerns and challenges for educators, professionals and students. This systematic review aims to explore LLM applications in medical education, specifically their impact on medical students' learning experiences. Methods: A systematic search was performed in PubMed, Web of Science and Embase for articles discussing the applications of LLMs in medical education using selected keywords related to LLMs and medical education, from the time of ChatGPT's debut until February 2024. Only articles available in full text or English were reviewed. The credibility of each study was critically appraised by two independent reviewers. Results: The systematic review identified 166 studies, of which 40 were found by review to be relevant to the study. Among the 40 relevant studies, key themes included LLM capabilities, benefits such as personalised learning and challenges regarding content accuracy. Importantly, 42.5% of these studies specifically evaluated LLMs in a novel way, including ChatGPT, in contexts such as medical exams and clinical/biomedical information, highlighting their potential in replicating human-level performance in medical knowledge. The remaining studies broadly discussed the prospective role of LLMs in medical education, reflecting a keen interest in their future potential despite current constraints. Conclusions: The responsible implementation of LLMs in medical education offers a promising opportunity to enhance learning experiences. However, ensuring information accuracy, emphasising skill-building and maintaining ethical safeguards are crucial. Continuous critical evaluation and interdisciplinary collaboration are essential for the appropriate integration of LLMs in medical education.

Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis (Preprint)

Improving Clinical Expertise in Large Language Models Using Electronic Medical Records

A Survey of Large Language Models in Medicine: Progress, Application, and Challenge

Evaluating large language models in medical applications: a survey

How Large Language Models Perform on the United States Medical Licensing Examination: A Systematic Review

Large Language Models Leverage External Knowledge to Extend Clinical Insight Beyond Language Boundaries

Large Language Models for Medical OSCE Assessment: A Novel Approach to Transcript Analysis

Evaluating multiple large language models in pediatric ophthalmology

Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset

Are Large Language Models True Healthcare Jacks-of-All-Trades? Benchmarking Across Health Professions Beyond Physician Exams

A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions

Evaluating Large Language Models in Ophthalmology

A systematic review of large language models and their implications in medical education

Large language models in medicine: the potentials and pitfalls

A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations

The Role of Large Language Models in Medical Education: Applications and Implications

Large language models (LLMs) in radiology exams for medical students: Performance and consequences

Evaluation of large language model performance on the Biomedical Language Understanding and Reasoning Benchmark

MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models

Large language models encode clinical knowledge

Towards Evaluating and Building Versatile Large Language Models for Medicine