Exploring the Effectiveness of Instruction Tuning in Biomedical Language Processing

Omid Rohanian,Mohammadmahdi Nouriborji,David A. Clifton
2024-01-01
Abstract:Large Language Models (LLMs), particularly those similar to ChatGPT, have significantly influenced the field of Natural Language Processing (NLP). While these models excel in general language tasks, their performance in domain-specific downstream tasks such as biomedical and clinical Named Entity Recognition (NER), Relation Extraction (RE), and Medical Natural Language Inference (NLI) is still evolving. In this context, our study investigates the potential of instruction tuning for biomedical language processing, applying this technique to two general LLMs of substantial scale. We present a comprehensive, instruction-based model trained on a dataset that consists of approximately $200,000$ instruction-focused samples. This dataset represents a carefully curated compilation of existing data, meticulously adapted and reformatted to align with the specific requirements of our instruction-based tasks. This initiative represents an important step in utilising such models to achieve results on par with specialised encoder-only models like BioBERT and BioClinicalBERT for various classical biomedical NLP tasks. Our work includes an analysis of the dataset's composition and its impact on model performance, providing insights into the intricacies of instruction tuning. By sharing our codes, models, and the distinctively assembled instruction-based dataset, we seek to encourage ongoing research and development in this area.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
This paper discusses the utility of large-scale language models (LLMs) in biomedical language processing, particularly in the performance of tasks such as named entity recognition (NER), relation extraction (RE), and medical natural language inference (NLI). In the study, the authors investigate the potential of instruction tuning for two large-scale general LLMs by creating a comprehensive dataset containing approximately 200,000 instruction-guided samples for training. This dataset is carefully curated to adapt to the requirements of instruction-based tasks. The main contributions of the paper include: 1. Introduction of two models specifically tailored for instruction tasks in the medical domain: Llama2-MedTuned-7B and Llama2-MedTuned-13B. 2. Provision of a unique instruction dataset for training these models to improve their performance on classical biomedical NLP tasks, competing with specialized models such as BioBERT and BioClinicalBERT. 3. Analysis of the composition of the dataset and its impact on model performance, providing insights into the nuances of instruction tuning. 4. Open sourcing of code, models, and datasets to facilitate future research and development. The study found that although large-scale language models excel in general language tasks, they still have limitations when it comes to handling complex tasks in specific domains, such as understanding and executing natural language instructions. Instruction tuning of LLMs can improve their performance in medical NLP tasks but has not yet reached the level of specialized models. The paper also conducted ablation studies, exploring the impact of different sampling strategies on model performance and comparing them with baseline models such as DistilBERT and BioBERT. In future work, the authors plan to expand the dataset to cover a wider range of biomedical and clinical tasks and explore the integration of the latest NLP techniques to further enhance the model's performance.