Research on Named Entity Recognition in Chinese EMR Based on Semi-Supervised Learning with Dual Selected Strategy.

Jianzhuo Yan,Yanan Geng,Hongxia Xu,Yongchuan Yu,Shaofeng Tan,Dongdong He
DOI: https://doi.org/10.1145/3446132.3446407
2020-01-01
Abstract:With the construction of the electronic medical record system, medical record data begins to accumulate, and how to extract essential information from these resources has become a concern. And named entity recognition(NER) is the first step. With the help of doctors, we built a small Chinese electronic medical record annotation corpus. But the NER supervision method requires a large amount of manually labeled corpus. So to reduce the cost of it and make better use of the unlabeled corpus, this paper proposes a semi-supervised Chinese electronic medical record NER model based on ALBERT-BiLSTM-CRF which named CEMRNER. The model uses a Bidirectional Long Short Term Memory network (BiLSTM) and a Conditional Random Field model (CRF) to train the data and introduces the pre-training language model ALBERT to solve the problem of Chinese representation. At the same time, we propose a dual selected strategy to select the high confidence samples and expand the training set. The dual strategy can ensure the accuracy i automatically labeled data, and reduce the error iteration in semi-supervised learning. The experiment and analysis show that compared with other models, this method is more accurate and comprehensive. The precision, recall rate, and F1Score are 85.45%, 87.81%, and 86.61%, respectively. The paper proves that using a semi-supervised method and pre-training ALBERT can improve the accuracy of recognition under the condition of less labeled data.
What problem does this paper attempt to address?