Abstract:The medical literature contains valuable knowledge, such as the clinical symptoms, diagnosis, and treatments of a particular disease. Named Entity Recognition (NER) is the initial step in extracting this knowledge from unstructured text and presenting it as a Knowledge Graph (KG). However, the previous approaches of NER have often suffered from small-scale human-labelled training data. Furthermore, extracting knowledge from Chinese medical literature is a more complex task because there is no segmentation between Chinese characters. Recently, the pretraining models, which obtain representations with the prior semantic knowledge on large-scale unlabelled corpora, have achieved state-of-the-art results for a wide variety of Natural Language Processing (NLP) tasks. However, the capabilities of pretraining models have not been fully exploited, and applications of other pretraining models except BERT in specific domains, such as NER in Chinese medical literature, are also of interest. In this paper, we enhance the performance of NER in Chinese medical literature using pretraining models. First, we propose a method of data augmentation by replacing the words in the training set with synonyms through the Mask Language Model (MLM), which is a pretraining task. Then, we consider NER as the downstream task of the pretraining model and transfer the prior semantic knowledge obtained during pretraining to it. Finally, we conduct experiments to compare the performances of six pretraining models (BERT, BERT-WWM, BERT-WWM-EXT, ERNIE, ERNIE-tiny, and RoBERTa) in recognizing named entities from Chinese medical literature. The effects of feature extraction and fine-tuning, as well as different downstream model structures, are also explored. Experimental results demonstrate that the method of data augmentation we proposed can obtain meaningful improvements in the performance of recognition. Besides, RoBERTa-CRF achieves the highest F 1-score compared with the previous methods and other pretraining models.

Named Entity Recognition Based on Pre-training Model and Multi-head Attention Mechanism

Improving Biomedical Named Entity Recognition with a Unified Multi-Task MRC Framework

Pretraining Multi-modal Representations for Chinese NER Task with Cross-Modality Attention

BERT Named Entity Recognition with Self-attention Mechanism

Named entity recognition model based on Multi‐BiLSTM and competition mechanism

DBM-CNER: A Dual-Branch Multifeature Model for Chinese Named Entity Recognition.

Chinese Named Entity Recognition Method Combining ALBERT and a Local Adversarial Training and Adding Attention Mechanism

Chinese Named Entity Recognition with a Multi-Phase Model

Coarse-to-Fine Pre-training for Named Entity Recognition

Named entity recognition based on semi-supervised ensemble learning with the improved tri-training algorithm

Enhanced Chinese named entity recognition with multi-granularity BERT adapter and efficient global pointer

A Chinese named entity recognition model: integrating label knowledge and lexicon information

MSFM: Multi-view Semantic Feature Fusion Model for Chinese Named Entity Recognition.

Named Entity Recognition in Chinese Medical Literature Using Pretraining Models

Chinese Clinical Named Entity Recognition with ALBERT and MHA Mechanism

CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition

Named Entity Recognition via Machine Reading Comprehension: A Multi-Task Learning Approach

A BIGRU-Based Stacked Attention Network for Biomedical Named Entity Recognition with Chinese EMRs

Named entity recognition for Chinese based on global pointer and adversarial training

Multi-Grained Named Entity Recognition

Named entity recognition of Chinese electronic medical records based on a hybrid neural network and medical MC-BERT