Exploring the Effectiveness of Instruction Tuning in Biomedical Language Processing

Omid Rohanian,Mohammadmahdi Nouriborji,David A. Clifton

2024-01-01

Abstract:Large Language Models (LLMs), particularly those similar to ChatGPT, have significantly influenced the field of Natural Language Processing (NLP). While these models excel in general language tasks, their performance in domain-specific downstream tasks such as biomedical and clinical Named Entity Recognition (NER), Relation Extraction (RE), and Medical Natural Language Inference (NLI) is still evolving. In this context, our study investigates the potential of instruction tuning for biomedical language processing, applying this technique to two general LLMs of substantial scale. We present a comprehensive, instruction-based model trained on a dataset that consists of approximately $200,000$ instruction-focused samples. This dataset represents a carefully curated compilation of existing data, meticulously adapted and reformatted to align with the specific requirements of our instruction-based tasks. This initiative represents an important step in utilising such models to achieve results on par with specialised encoder-only models like BioBERT and BioClinicalBERT for various classical biomedical NLP tasks. Our work includes an analysis of the dataset's composition and its impact on model performance, providing insights into the intricacies of instruction tuning. By sharing our codes, models, and the distinctively assembled instruction-based dataset, we seek to encourage ongoing research and development in this area.

Computation and Language,Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

This paper discusses the utility of large-scale language models (LLMs) in biomedical language processing, particularly in the performance of tasks such as named entity recognition (NER), relation extraction (RE), and medical natural language inference (NLI). In the study, the authors investigate the potential of instruction tuning for two large-scale general LLMs by creating a comprehensive dataset containing approximately 200,000 instruction-guided samples for training. This dataset is carefully curated to adapt to the requirements of instruction-based tasks. The main contributions of the paper include: 1. Introduction of two models specifically tailored for instruction tasks in the medical domain: Llama2-MedTuned-7B and Llama2-MedTuned-13B. 2. Provision of a unique instruction dataset for training these models to improve their performance on classical biomedical NLP tasks, competing with specialized models such as BioBERT and BioClinicalBERT. 3. Analysis of the composition of the dataset and its impact on model performance, providing insights into the nuances of instruction tuning. 4. Open sourcing of code, models, and datasets to facilitate future research and development. The study found that although large-scale language models excel in general language tasks, they still have limitations when it comes to handling complex tasks in specific domains, such as understanding and executing natural language instructions. Instruction tuning of LLMs can improve their performance in medical NLP tasks but has not yet reached the level of specialized models. The paper also conducted ablation studies, exploring the impact of different sampling strategies on model performance and comparing them with baseline models such as DistilBERT and BioBERT. In future work, the authors plan to expand the dataset to cover a wider range of biomedical and clinical tasks and explore the integration of the latest NLP techniques to further enhance the model's performance.

Exploring the Effectiveness of Instruction Tuning in Biomedical Language Processing

BioInstruct: Instruction Tuning of Large Language Models for Biomedical Natural Language Processing

A Novel Cascade Instruction Tuning Method for Biomedical NER.

From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning

Instruction Tuning for Large Language Models: A Survey

Maybe Only 0.5 Training Data Instruction Tuning

Biomedical Visual Instruction Tuning with Clinician Preference Alignment

Instruction-tuned Large Language Models for Machine Translation in the Medical Domain

Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions?

Advancing entity recognition in biomedicine via instruction tuning of large language models

Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach

Multilingual Instruction Tuning With Just a Pinch of Multilinguality

Does Biomedical Training Lead to Better Medical Performance?

From Base to Conversational: Japanese Instruction Dataset and Tuning Large Language Models

INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning

BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning

Instruction-tuning Aligns LLMs to the Human Brain

A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks

Linguistically-Informed Multilingual Instruction Tuning: Is There an Optimal Set of Languages to Tune?

INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models

Instruction-tuned large language models misalign with natural language comprehension in humans