ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation

Guangyu Wang,Guoxing Yang,Zongxin Du,Longjun Fan,Xiaohu Li
2023-06-17
Abstract:Large language models have exhibited exceptional performance on various Natural Language Processing (NLP) tasks, leveraging techniques such as the pre-training, and instruction fine-tuning. Despite these advances, their effectiveness in medical applications is limited, due to challenges such as factual inaccuracies, reasoning abilities, and lack grounding in real-world experience. In this study, we present ClinicalGPT, a language model explicitly designed and optimized for clinical scenarios. By incorporating extensive and diverse real-world data, such as medical records, domain-specific knowledge, and multi-round dialogue consultations in the training process, ClinicalGPT is better prepared to handle multiple clinical task. Furthermore, we introduce a comprehensive evaluation framework that includes medical knowledge question-answering, medical exams, patient consultations, and diagnostic analysis of medical records. Our results demonstrate that ClinicalGPT significantly outperforms other models in these tasks, highlighting the effectiveness of our approach in adapting large language models to the critical domain of healthcare.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges faced in using large - language models (LLMs) in medical applications. Although existing large - language models perform well in natural - language - processing tasks, their applications in the medical field are restricted by the following: 1. **Factual accuracy**: Existing large - language models may produce factual errors, which are unacceptable in medical scenarios. 2. **Reasoning ability**: These models perform poorly when handling complex medical - reasoning tasks. 3. **Lack of real - world experience**: Due to the lack of training with real - world medical data, these models often give general answers and lack specific medical insights. To overcome these challenges, the authors propose **ClinicalGPT**, a large - language model specifically designed and optimized for clinical scenarios. By introducing a large amount of real - world medical data (such as electronic medical records, domain - specific knowledge, and multi - round - dialogue consultations), ClinicalGPT can better handle various clinical tasks. In addition, the authors have established a comprehensive evaluation framework, including tasks such as medical - knowledge Q&A, medical examinations, patient consultations, and medical - record analysis, to verify the performance of ClinicalGPT. Specifically, the main contributions of this paper include: - **Dataset construction**: Utilize diverse medical datasets, including cMedQA2, cMedQA - KG, MD - EHR, MEDQA - MCMLE, and MedDialog, etc., to train and evaluate the model. - **Fine - tuning method**: Adopt a method that combines supervised fine - tuning (SFT) and reinforcement learning (RL) to improve the performance of the model. - **Evaluation framework**: Design a set of comprehensive evaluation indicators covering multiple medical - application scenarios to ensure the practicality and reliability of the model. Through these methods, ClinicalGPT significantly outperforms other existing large - language models in multiple medical tasks, demonstrating its potential and advantages in the medical field.