Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs

Clément Christophe,Tathagata Raha,Svetlana Maslenkova,Muhammad Umar Salman,Praveen K Kanithi,Marco AF Pimentel,Shadab Khan
2024-09-23
Abstract:Large Language Models (LLMs) have demonstrated significant potential in transforming clinical applications. In this study, we investigate the efficacy of four techniques in adapting LLMs for clinical use-cases: continuous pretraining, instruct fine-tuning, NEFTune, and prompt engineering. We employ these methods on Mistral 7B and Mixtral 8x7B models, leveraging a large-scale clinical pretraining dataset of 50 billion tokens and an instruct fine-tuning dataset of 500 million tokens. Our evaluation across various clinical tasks reveals the impact of each technique. While continuous pretraining beyond 250 billion tokens yields marginal improvements on its own, it establishes a strong foundation for instruct fine-tuning. Notably, NEFTune, designed primarily to enhance generation quality, surprisingly demonstrates additional gains on our benchmark. Complex prompt engineering methods further enhance performance. These findings show the importance of tailoring fine-tuning strategies and exploring innovative techniques to optimize LLM performance in the clinical domain.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to optimize the performance of large - language models (LLMs) in clinical applications through techniques such as continuous pretraining, instruct fine - tuning, and advanced prompt engineering. Specifically, the researchers hope: 1. **Explore the effects of continuous pretraining**: The researchers hope to further enhance the LLMs' understanding ability of specific fields by conducting continuous pretraining on large - scale clinical data. Although this method has instability, they hope to overcome these challenges by balancing in - domain data and general - language data. 2. **Evaluate the effectiveness of different fine - tuning strategies**: In addition to the traditional fine - tuning methods, the researchers also introduced NEFTune (a new fine - tuning technique) and evaluated its performance in clinical tasks. NEFTune improves the generation quality by injecting noise in the embedding layer and may bring additional performance improvements. 3. **Verify the role of complex prompt engineering**: The researchers used a variety of prompt engineering techniques (such as Chain - of - Thought and KNN CoT ensembles) to evaluate whether these methods can significantly improve the model's performance without additional training. ### Main problem summary - **How to effectively use continuous pretraining to improve the performance of LLMs in clinical tasks?** - **Can different fine - tuning strategies (such as NEFTune) further improve the model performance?** - **Can complex prompt engineering techniques be used as an alternative or complementary means to optimize the clinical applications of LLMs?** By systematically comparing the effects of these methods, the researchers hope to provide valuable references for future research and develop more accurate, reliable, and practically influential clinical LLMs.