Abstract:Using language models (LMs) pre-trained in a self-supervised setting on large corpora and then fine-tuning for a downstream task has helped to deal with the problem of limited label data for supervised learning tasks such as Named Entity Recognition (NER). Recent research in biomedical language processing has offered a number of biomedical LMs pre-trained using different methods and techniques that advance results on many BioNLP tasks, including NER. However, there is still a lack of a comprehensive comparison of pre-training approaches that would work more optimally in the biomedical domain. This paper aims to investigate different pre-training methods, such as pre-training the biomedical LM from scratch and pre-training it in a continued fashion. We compare existing methods with our proposed pre-training method of initializing weights for new tokens by distilling existing weights from the BERT model inside the context where the tokens were found. The method helps to speed up the pre-training stage and improve performance on NER. In addition, we compare how masking rate, corruption strategy, and masking strategies impact the performance of the biomedical LM. Finally, using the insights from our experiments, we introduce a new biomedical LM (BIOptimus), which is pre-trained using Curriculum Learning (CL) and contextualized weight distillation method. Our model sets new states of the art on several biomedical Named Entity Recognition (NER) tasks. We release our code and all pre-trained models

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to optimize the pre - training methods of language models in the biomedical field to improve their performance on the named entity recognition (NER) task. Specifically, the author focuses on the following aspects: 1. **Lack of comprehensive pre - training method comparison**: Although there are already some language models for the biomedical field (such as BioBERT, PubMedBERT, etc.), the pre - training methods of these models have not been systematically compared and evaluated. Therefore, the author hopes to find a better pre - training strategy by comparing different pre - training methods. 2. **Vocabulary problem**: Existing pre - training methods often lack domain - specific vocabularies when continuing pre - training, which will affect the model's understanding and processing of in - domain terms. The author proposes a new method of initializing weights, by distilling contextualized weights from existing BERT models to solve this problem. 3. **The influence of masking strategies and masking rates**: Different masking strategies and masking rates have a great impact on the pre - training effect of language models. The author compares a variety of masking strategies and masking rates through experiments to determine the optimal settings. 4. **The application of curriculum learning (CL)**: The author introduces the method of curriculum learning, gradually increasing the complexity of tasks to help the model learn better. This method has been proven to improve the performance of the model in the pre - training stage. ### Main contributions - Propose a new method of initializing weights, that is, initializing word vectors in the new vocabulary by distilling contextualized weights in existing BERT models. - Systematically compare different pre - training methods, including pre - training from scratch, continuing pre - training and hybrid pre - training, and analyze their performance on the NER task. - Explore the influence of different masking strategies and masking rates on the pre - training effect. - Introduce the method of curriculum learning, gradually increasing the complexity of pre - training tasks, thereby improving the performance of the model. ### Experimental results The author verifies their method through multiple NER datasets (such as BC5 - chem, BC5 - disease, NCBI - disease, etc.). The results show that the model (BIOptimus 0.4) pre - trained using the curriculum learning and contextualized weight distillation methods has achieved new best results (state - of - the - art) on multiple datasets, especially on the BC5 - chem, NCBI - disease and BC2GM datasets. In short, this paper aims to optimize the pre - training methods of language models in the biomedical field through systematic research and experiments, thereby improving their performance on the named entity recognition task.

BIOptimus: Pre-training an Optimal Biomedical Language Model with Curriculum Learning for Named Entity Recognition

Improving Biomedical Named Entity Recognition with a Unified Multi-Task MRC Framework

A pre-training and self-training approach for biomedical named entity recognition

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition

Advancing entity recognition in biomedicine via instruction tuning of large language models

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Effective Use of Bidirectional Language Modeling for Transfer Learning in Biomedical Named Entity Recognition

Named Entity Recognition in Chinese Medical Literature Using Pretraining Models

KEBLM: Knowledge-Enhanced Biomedical Language Models

Improving Biomedical Pretrained Language Models with Knowledge

LLMs in Biomedicine: A study on clinical Named Entity Recognition

Advantage of gH-difference on the second-order fuzzy linear differential equations with constant coefficients

Energetics of temperature regulation and foraging in a bumblebee,Bombus terricola kirby

Improving Pre-trained Language Model Sensitivity via Mask Specific losses: A case study on Biomedical NER

Multi-level biomedical NER through multi-granularity embeddings and enhanced labeling

Pre-trained Language Models in Biomedical Domain: A Systematic Survey