Using a Pre-Trained Language Model for Medical Named Entity Extraction in Chinese Clinic Text

Mengyuan Zhang,Jin Wang,Xuejie Zhang
DOI: https://doi.org/10.1109/iceiec49280.2020.9152257
2020-01-01
Abstract:The implementation of name entity recognition (NER) in Chinese clinic text is challenging. These methods have several limitations, such as the complexity of the medical text structure, the vast difference in entity length, and identical entities with different entity categories in different contexts. To address these problems, we propose a combination model of both pre-trained bi-directional long short-term memory (Bi- LSTM) and the conditional random field (CRF) model. Due to the specification of medical texts, we do not employ Chinese word segmentation tools. A character-level feature is introduced as an input feature, which is subsequently mapped into char embeddings by using an embedding layer of the bi-directional encoder representation from transformers (BERT) model. A BiLSTM layer and a CRF are utilized to encode the char embeddings and output the final label. The experiments are conducted with CNMER2019 to evaluate the performance and compared with several previous models. The results show that the proposed model outperformed other models and achieved better performance with NER in Chinese clinic text.
What problem does this paper attempt to address?