Abstract:$\textbf{Objectives}$: Large Language Models (LLMs) such as ChatGPT and Med-PaLM have excelled in various medical question-answering tasks. However, these English-centric models encounter challenges in non-English clinical settings, primarily due to limited clinical knowledge in respective languages, a consequence of imbalanced training corpora. We systematically evaluate LLMs in the Chinese medical context and develop a novel in-context learning framework to enhance their performance. $\textbf{Materials and Methods}$: The latest China National Medical Licensing Examination (CNMLE-2022) served as the benchmark. We collected 53 medical books and 381,149 medical questions to construct the medical knowledge base and question bank. The proposed Knowledge and Few-shot Enhancement In-context Learning (KFE) framework leverages the in-context learning ability of LLMs to integrate diverse external clinical knowledge sources. We evaluated KFE with ChatGPT(GPT3.5), GPT4, Baichuan2(BC2)-7B, and BC2-13B in CNMLE-2022 and investigated the effectiveness of different pathways for incorporating LLMs with medical knowledge from 7 perspectives. $\textbf{Results}$: Directly applying ChatGPT failed to qualify for the CNMLE-2022 at a score of 51. Cooperated with the KFE, the LLMs with varying sizes yielded consistent and significant improvements. The ChatGPT's performance surged to 70.04 and GPT-4 achieved the highest score of 82.59. This surpasses the qualification threshold (60) and exceeds the average human score of 68.70. It also enabled a smaller BC2-13B to pass the examination, showcasing the great potential in low-resource settings. $\textbf{Conclusion}$: By synergizing medical knowledge through in-context learning, LLM can extend clinical insight beyond language barriers, significantly reducing language-related disparities of LLM applications and ensuring global benefit in healthcare.

Enhancing Clinical Accuracy of Medical Chatbots with Large Language Models

TCMChat: A Generative Large Language Model for Traditional Chinese Medicine

ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge

Improving Clinical Expertise in Large Language Models Using Electronic Medical Records

ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge

Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-world Multi-turn Dialogue

Integrating UMLS Knowledge into Large Language Models for Medical Question Answering

Leveraging Large Language Model as Simulated Patients for Clinical Education

MedChatZH: a Better Medical Adviser Learns from Better Instructions

An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models

[Relationship between psychological and physiological dependence and drug addiction].

Large Language Models Leverage External Knowledge to Extend Clinical Insight Beyond Language Boundaries

Enhancing Healthcare through Large Language Models: A Study on Medical Question Answering

AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator

Leveraging LLM: Implementing an Advanced AI Chatbot for Healthcare

AI Hospital: Interactive Evaluation and Collaboration of LLMs As Intern Doctors for Clinical Diagnosis

Large Language Model Prompting Techniques for Advancement in Clinical Medicine

Augmenting Black-box LLMs with Medical Textbooks for Clinical Question Answering

Leveraging Large Language Models for Patient Engagement: The Power of Conversational AI in Digital Health

MedChatZH: A tuning LLM for traditional Chinese medicine consultations

PMC-LLaMA: Towards Building Open-source Language Models for Medicine