Aqulia-Med LLM: Pioneering Full-Process Open-Source Medical Language Models

Lulu Zhao,Weihao Zeng,Xiaofeng Shi,Hua Zhou,Donglin Hao,Yonghua Lin
2024-06-18
Abstract:Recently, both closed-source LLMs and open-source communities have made significant strides, outperforming humans in various general domains. However, their performance in specific professional fields such as medicine, especially within the open-source community, remains suboptimal due to the complexity of medical knowledge. We propose Aquila-Med, a bilingual medical LLM based on Aquila, addressing these challenges through continue pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF). We construct a large-scale Chinese and English medical dataset for continue pre-training and a high-quality SFT dataset, covering extensive medical specialties. Additionally, we develop a high-quality Direct Preference Optimization (DPO) dataset for further alignment. Aquila-Med achieves notable results across single-turn, multi-turn dialogues, and medical multiple-choice questions, demonstrating the effectiveness of our approach. We open-source the datasets and the entire training process, contributing valuable resources to the research community. Our models and datasets will released at <a class="link-external link-https" href="https://huggingface.co/BAAI/AquilaMed-RL" rel="external noopener nofollow">this https URL</a>.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
This paper proposes a solution to the underperformance of specialized language models in the medical field. Currently, although large-scale language models (LLMs) in general domains have surpassed human performance in many aspects, their performance in specific professional fields such as medicine is still suboptimal, especially in open-source community models. In the paper, the authors propose a bilingual medical LLM based on Aquila, named Aquila-Med, to enhance model performance through continued pre-training, supervised fine-tuning (SFT), and reinforcement learning based on human feedback (RLHF). First, they construct a large-scale Chinese and English medical dataset for continued pre-training and create a high-quality SFT dataset covering a wide range of medical specialties, consisting of approximately 330,000 examples. Additionally, they develop a high-quality dataset for direct preference optimization (DPO) including question-answering and multiple-choice questions in medicine. Aquila-Med demonstrates excellent performance in tasks such as single-turn dialogue, multi-turn dialogue, and medical multiple-choice questions, proving the effectiveness of the proposed methods. The paper also open-sources the datasets and the entire training process to promote the development of the research community. The model and datasets will be released on Hugging Face, providing resources for other researchers. In summary, the paper aims to address the accuracy and safety issues of specialized language models in the medical field by improving the model's understanding and application capabilities of complex medical knowledge through innovative approaches.