Abstract:The medical literature contains valuable knowledge, such as the clinical symptoms, diagnosis, and treatments of a particular disease. Named Entity Recognition (NER) is the initial step in extracting this knowledge from unstructured text and presenting it as a Knowledge Graph (KG). However, the previous approaches of NER have often suffered from small-scale human-labelled training data. Furthermore, extracting knowledge from Chinese medical literature is a more complex task because there is no segmentation between Chinese characters. Recently, the pretraining models, which obtain representations with the prior semantic knowledge on large-scale unlabelled corpora, have achieved state-of-the-art results for a wide variety of Natural Language Processing (NLP) tasks. However, the capabilities of pretraining models have not been fully exploited, and applications of other pretraining models except BERT in specific domains, such as NER in Chinese medical literature, are also of interest. In this paper, we enhance the performance of NER in Chinese medical literature using pretraining models. First, we propose a method of data augmentation by replacing the words in the training set with synonyms through the Mask Language Model (MLM), which is a pretraining task. Then, we consider NER as the downstream task of the pretraining model and transfer the prior semantic knowledge obtained during pretraining to it. Finally, we conduct experiments to compare the performances of six pretraining models (BERT, BERT-WWM, BERT-WWM-EXT, ERNIE, ERNIE-tiny, and RoBERTa) in recognizing named entities from Chinese medical literature. The effects of feature extraction and fine-tuning, as well as different downstream model structures, are also explored. Experimental results demonstrate that the method of data augmentation we proposed can obtain meaningful improvements in the performance of recognition. Besides, RoBERTa-CRF achieves the highest F 1-score compared with the previous methods and other pretraining models.

Incorporating Lexicon for Named Entity Recognition of Traditional Chinese Medicine Books.

A BERT-Based Named Entity Recognition Method of Warm Disease in Traditional Chinese Medicine

Incorporating lexicon and character glyph and morphological features into BiLSTM-CRF for Chinese medical NER

A Named Entity Recognition Method Enhanced with Lexicon Information and Text Local Feature

A Nested Named Entity Recognition Method for Traditional Chinese Medicine Records

Research on named entity recognition of Traditional Chinese Medicine chest discomfort cases incorporating domain vocabulary features

Using a Pre-Trained Language Model for Medical Named Entity Extraction in Chinese Clinic Text

Chinese Medical Record Entity Recognition Based on Lexicon and Self-attention

Named Entity Recognition in Traditional Chinese Medicine Clinical Cases Combining BiLSTM-CRF with Knowledge Graph

Named entity recognition of Traditional Chinese Medicine cases based on RoBERTa-BiLSTM-CRF.

ANeTCM: A Novel MRC Framework for Traditional Chinese Medicine Named Entity Recognition

An Attention-Based BiLSTM-CRF Model for Chinese Clinic Named Entity Recognition

SBLC: a Hybrid Model for Disease Named Entity Recognition Based on Semantic Bidirectional LSTMs and Conditional Random Fields

A BERT-BiLSTM-CRF Model for Chinese Electronic Medical Records Named Entity Recognition

Chinese Clinical Named Entity Recognition with Word-Level Information Incorporating Dictionaries

Extraction of Traditional Chinese Medicine Entity: Design of a Novel Span-Level Named Entity Recognition Method With Distant Supervision

Chinese Clinical Named Entity Recognition Via Multi-Head Self-Attention Based BiLSTM-CRF

Integrating Language Model and Reading Control Gate in BLSTM-CRF for Biomedical Named Entity Recognition

Chinese Named Entity Recognition Method for Domain-Specific Text

Named Entity Recognition from Biomedical Texts Using a Fusion Attention-Based BiLSTM-CRF.

Named Entity Recognition in Chinese Medical Literature Using Pretraining Models