Abstract:$\textbf{Objectives}$: Large Language Models (LLMs) such as ChatGPT and Med-PaLM have excelled in various medical question-answering tasks. However, these English-centric models encounter challenges in non-English clinical settings, primarily due to limited clinical knowledge in respective languages, a consequence of imbalanced training corpora. We systematically evaluate LLMs in the Chinese medical context and develop a novel in-context learning framework to enhance their performance. $\textbf{Materials and Methods}$: The latest China National Medical Licensing Examination (CNMLE-2022) served as the benchmark. We collected 53 medical books and 381,149 medical questions to construct the medical knowledge base and question bank. The proposed Knowledge and Few-shot Enhancement In-context Learning (KFE) framework leverages the in-context learning ability of LLMs to integrate diverse external clinical knowledge sources. We evaluated KFE with ChatGPT(GPT3.5), GPT4, Baichuan2(BC2)-7B, and BC2-13B in CNMLE-2022 and investigated the effectiveness of different pathways for incorporating LLMs with medical knowledge from 7 perspectives. $\textbf{Results}$: Directly applying ChatGPT failed to qualify for the CNMLE-2022 at a score of 51. Cooperated with the KFE, the LLMs with varying sizes yielded consistent and significant improvements. The ChatGPT's performance surged to 70.04 and GPT-4 achieved the highest score of 82.59. This surpasses the qualification threshold (60) and exceeds the average human score of 68.70. It also enabled a smaller BC2-13B to pass the examination, showcasing the great potential in low-resource settings. $\textbf{Conclusion}$: By synergizing medical knowledge through in-context learning, LLM can extend clinical insight beyond language barriers, significantly reducing language-related disparities of LLM applications and ensuring global benefit in healthcare.

Two-phase Framework Clinical Question-Answering; A case-study of Autocorrection for Guideline-concordance

Aligning Large Language Models for Clinical Tasks

Towards Expert-Level Medical Question Answering with Large Language Models

Integrating UMLS Knowledge into Large Language Models for Medical Question Answering

Large language models encode clinical knowledge

Large Language Model-Based Evaluation of Medical Question Answering Systems: Algorithm Development and Case Study

Emulating Human Cognitive Processes for Expert-Level Medical Question-Answering with Large Language Models

An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models

Reasoning with large language models for medical question answering

Answering real-world clinical questions using large language model based systems

Candidate-Heuristic In-Context Learning: A new framework for enhancing medical visual question answering with LLMs

An Active Inference Strategy for Prompting Reliable Responses from Large Language Models in Medical Practice

Adaptive Reasoning and Acting in Medical Language Agents

Augmenting Black-box LLMs with Medical Textbooks for Clinical Question Answering

Guiding Clinical Reasoning with Large Language Models via Knowledge Seeds

A Framework to Assess Clinical Safety and Hallucination Rates of LLMs for Medical Text Summarisation

Dynamic Q&A of Clinical Documents with Large Language Models

Towards Reliable Medical Question Answering: Techniques and Challenges in Mitigating Hallucinations in Language Models

Can LLMs Correct Physicians, Yet? Investigating Effective Interaction Methods in the Medical Domain

Large Language Models Leverage External Knowledge to Extend Clinical Insight Beyond Language Boundaries

A Novel Question-Answering Framework for Automated Abstract Screening Using Large Language Models