Continuous Training and Fine-tuning for Domain-Specific Language Models in Medical Question Answering

Zhen Guo,Yining Hua
2023-11-01
Abstract:Large language models exhibit promising general capabilities but often lack specialized knowledge for domain-specific tasks. Developing domain experts from a base model enables a range of applications without prohibitive training costs. This work demonstrates a method using continuous training and instruction fine-tuning to rapidly adapt Llama 2 base models to the Chinese medical domain. We first conduct continuous training on 1B tokens from Chinese medical references to teach relevant vocabulary and knowledge. The models are then fine-tuned on 54K examples sourced from the Chinese National Medical Licensing Examination. Experiments on Chinese medical data confirm the effectiveness of this approach, producing a model comparable to GPT-3.5-turbo while using way less computational resource. The resulting domain-specific model could be useful for various Chinese medical applications. More broadly, this provides a template for domain-specific training of large language models in areas where pre-trained models lack the required expertise, such as law, science, and engineering.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the issue of large language models (LLMs) lacking expertise in specific domain tasks. Specifically, although current large language models like GPT-3 perform excellently on general language tasks, their performance in specialized fields such as medicine and law is less than ideal. The paper proposes a method to quickly adapt general language models to specific domains through continuous pre-training and instruction fine-tuning, thereby enhancing the model's professional capabilities without consuming excessive computational resources. The research primarily focuses on the Chinese medical field and demonstrates the effectiveness of this method, with the resulting model's performance approaching that of GPT-3.5-turbo, but with significantly reduced computational resources. Additionally, this method provides a potential application template for other fields requiring specialized knowledge, such as law, science, and engineering.