Abstract:Large Language Models (LLMs) have demonstrated significant potential in transforming clinical applications. In this study, we investigate the efficacy of four techniques in adapting LLMs for clinical use-cases: continuous pretraining, instruct fine-tuning, NEFTune, and prompt engineering. We employ these methods on Mistral 7B and Mixtral 8x7B models, leveraging a large-scale clinical pretraining dataset of 50 billion tokens and an instruct fine-tuning dataset of 500 million tokens. Our evaluation across various clinical tasks reveals the impact of each technique. While continuous pretraining beyond 250 billion tokens yields marginal improvements on its own, it establishes a strong foundation for instruct fine-tuning. Notably, NEFTune, designed primarily to enhance generation quality, surprisingly demonstrates additional gains on our benchmark. Complex prompt engineering methods further enhance performance. These findings show the importance of tailoring fine-tuning strategies and exploring innovative techniques to optimize LLM performance in the clinical domain.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to optimize the performance of large - language models (LLMs) in clinical applications through techniques such as continuous pretraining, instruct fine - tuning, and advanced prompt engineering. Specifically, the researchers hope: 1. **Explore the effects of continuous pretraining**: The researchers hope to further enhance the LLMs' understanding ability of specific fields by conducting continuous pretraining on large - scale clinical data. Although this method has instability, they hope to overcome these challenges by balancing in - domain data and general - language data. 2. **Evaluate the effectiveness of different fine - tuning strategies**: In addition to the traditional fine - tuning methods, the researchers also introduced NEFTune (a new fine - tuning technique) and evaluated its performance in clinical tasks. NEFTune improves the generation quality by injecting noise in the embedding layer and may bring additional performance improvements. 3. **Verify the role of complex prompt engineering**: The researchers used a variety of prompt engineering techniques (such as Chain - of - Thought and KNN CoT ensembles) to evaluate whether these methods can significantly improve the model's performance without additional training. ### Main problem summary - **How to effectively use continuous pretraining to improve the performance of LLMs in clinical tasks?** - **Can different fine - tuning strategies (such as NEFTune) further improve the model performance?** - **Can complex prompt engineering techniques be used as an alternative or complementary means to optimize the clinical applications of LLMs?** By systematically comparing the effects of these methods, the researchers hope to provide valuable references for future research and develop more accurate, reliable, and practically influential clinical LLMs.

Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs

Improving Clinical Expertise in Large Language Models Using Electronic Medical Records

Med42 -- Evaluating Fine-Tuning Strategies for Medical LLMs: Full-Parameter vs. Parameter-Efficient Approaches

Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities

Can LLMs' Tuning Methods Work in Medical Multimodal Domain?

The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities

MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications

Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs

Harmonising the Clinical Melody: Tuning Large Language Models for Hospital Course Summarisation in Clinical Coding

[Synthesis, identification of artificial antigen of catalpol and preliminary study of immunogenicity].

Learning to match patients to clinical trials using large language models

Comparison of Prompt Engineering and Fine-Tuning Strategies in Large Language Models in the Classification of Clinical Notes

Leveraging Large Language Models for Knowledge-free Weak Supervision in Clinical Natural Language Processing

Large language models encode clinical knowledge

Edinburgh Clinical NLP at SemEval-2024 Task 2: Fine-tune your model unless you have access to GPT-4

Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain

Empirical Study of LLM Fine-Tuning for Text Classification in Legal Document Review

Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data

Assessing Fine-Tuning Efficacy in LLMs: A Case Study with Learning Guidance Chatbots

Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach

Exploring Continual Fine-Tuning for Enhancing Language Ability in Large Language Model