Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model

Qichen Ye,Junling Liu,Dading Chong,Peilin Zhou,Yining Hua,Fenglin Liu,Meng Cao,Ziming Wang,Xuxin Cheng,Zhu Lei,Zhenhua Guo

2024-04-17

Abstract:Integrating large language models (LLMs) into healthcare holds great potential but faces challenges. Pre-training LLMs from scratch for domains like medicine is resource-heavy and often unfeasible. On the other hand, sole reliance on Supervised Fine-tuning (SFT) can result in overconfident predictions and may not tap into domain-specific insights. In response, we present a multi-stage training method combining Domain-specific Continued Pre-training (DCPT), SFT, and Direct Preference Optimization (DPO). In addition, we publish a 3Gb Chinese Medicine (ChiMed) dataset, encompassing medical question answering, plain texts, knowledge graphs, and dialogues, segmented into three training stages. The medical LLM trained with our pipeline, Qilin-Med, shows substantial performance improvement. In the CPT and SFT phases, Qilin-Med achieved 38.4% and 40.0% accuracy on the CMExam test set, respectively. It outperformed the basemodel Baichuan-7B (accuracy: 33.5%), by 7.5%. In the DPO phase, it scored 16.66 in BLEU-1 and 27.44 in ROUGE-1 on the Huatuo-26M test set, bringing further improvement to the SFT phase (12.69 in BLEU-1 and 24.21 in ROUGE-1). Additionally, we have further enhanced the model's performance through the Retrieval Augmented Generation (RAG) approach. Experiments demonstrate that Qilin-Med-RAG achieves an accuracy rate of 42.8% on CMExam. These results highlight the contribution of our novel training approach in building LLMs for medical applications.

Computation and Language

What problem does this paper attempt to address?

The paper aims to address the challenges faced by large language models (LLMs) in the application of the medical field. Specifically, the paper proposes Qilin-Med, an advanced Chinese medical large language model that improves model performance by combining domain-specific continuous pre-training (CPT), supervised fine-tuning (SFT), and direct preference optimization (DPO). Additionally, the paper constructs a dataset named ChiMed, which includes various types of medical data (Q&A, plain text, knowledge graphs, and dialogues) for phased training of the model. Through this approach, Qilin-Med demonstrates significant performance improvements on multiple benchmarks, particularly in medical understanding and reasoning tasks. This research showcases the effectiveness of multi-stage training methods in building LLMs for medical applications.

Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model

TCMChat: A Generative Large Language Model for Traditional Chinese Medicine

ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences

Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare

Qibo: A Large Language Model for Traditional Chinese Medicine

Lingdan: enhancing encoding of traditional Chinese medicine knowledge for clinical reasoning tasks with large language models

Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-world Multi-turn Dialogue

Large Language Models Leverage External Knowledge to Extend Clinical Insight Beyond Language Boundaries

Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset

MedChatZH: A tuning LLM for traditional Chinese medicine consultations

Me LLaMA: Foundation Large Language Models for Medical Applications

Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese

MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation

Improving Clinical Expertise in Large Language Models Using Electronic Medical Records

MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models

PMC-LLaMA: Towards Building Open-source Language Models for Medicine

LlamaCare: A Large Medical Language Model for Enhancing Healthcare Knowledge Sharing

Continuous Training and Fine-tuning for Domain-Specific Language Models in Medical Question Answering

DoctorGPT: A Large Language Model with Chinese Medical Question-Answering Capabilities

PMC-LLaMA: toward building open-source language models for medicine

CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios