Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis (Preprint)

Bairong Shen,Hui Zong,Rongrong Wu,Jiaxue Cha,Jiao Wang,Erman Wu,Jiakun Li,Yi Zhou,Chi Zhang,Weizhe Feng
DOI: https://doi.org/10.2196/preprints.66114
2024-01-01
Abstract:Background: The integration of large language models (LLMs) into medical education has demonstrated tremendous potential, with significant implications for learning and assessment. Objective: This study aims to present MedExamLLM, a comprehensive platform designed to systematically evaluate the performance of LLMs across a diverse range of medical exams conducted globally. Methods: We performed a systematic search in the PubMed database to identify relevant publications. The screening process of candidate publications was independently conducted by two researchers to ensure accuracy and reliability. We manurally curated, standardized, and organized data, including exam information, data process information, model performance, data availability, and reference. The web platform was developed utilizing Streamlit, Bootstrap, and Apache ECharts. Results: MedExamLLM is an open-source, free-accessible, and public-available online platform, providing the comprehensive performance evaluation information and evidence knowledge of LLMs on medical exams around the world. MedExamLLM comprises information of 16 large language models on 198 medical exams conducted across 28 countries in 15 languages from year 2009 to 2023. The United States leads in the number of medical exams and publications, with English being the primary language used in these exams. The GPT series models, especially GPT-4, demonstrate superior performance compared to other models, achieving significantly higher pass rates. The analysis reveals significant variability in the capabilities of LLMs across different geographic and linguistic contexts. Conclusions: MedExamLLM platform serves as a valuable resource for educators, researchers, and developers in the fields of clinical medicine and artificial intelligence. By providing valuable insights into the capabilities of LLMs in medical exams around the world, MedExamLLM not only contributes to the growing body of knowledge on LLMs in education, but also supports the future integration of artificial intelligence technologies into medical education.
What problem does this paper attempt to address?