Chinese Medical Named Entity Recognition Based on Fusion of Global Features and Multi-Local Features
Huarong Sun,Jianfeng Wang,Bo Li,Xiyuan Cao,Junbin Zang,Chenyang Xue,Zhidong Zhang
DOI: https://doi.org/10.1109/access.2023.3339610
IF: 3.9
2023-12-20
IEEE Access
Abstract:Chinese medical Named Entity Recognition (NER) is a task of Natural Language Processing (NLP), which aims to extract key information from Chinese medical texts. Recently, Transformer becomes the mainstream approach for NLP due to its powerful capability for global feature extraction. However, entities usually appear in the form of subsequences in NER, therefore the local features are not negligible, and the uncertainty of Chinese word segmentation increases the difficulty of this task. In this paper, we propose a network structure that combines global feature extraction and multi-local feature extraction to enhance the performance of Chinese medical NER. Based on the global feature extracted by the Transformer, Bi-LSTM is used to extract the multi-local features, and a context integration mechanism is used to enhance local features by integrating both forward and backward global contexts in each cell. This allows for a more comprehensive representation of individual cells. And a feature fusion method based on attention mechanism is proposed, which allows the decoder to better focus on the more important information for predicting the current character. During the global feature extraction, the flat-lattice structure is introduced to generate all the potential results of Chinese word segmentation. And the span-based relative positional encoding integrates direction and distance perception, which overcomes the shortcoming of Transformer's inability to capture sequential characteristics. Finally, a CRF with conditional constraints is used as the decoder of the model. Experimental results on two benchmark datasets show the effectiveness of our model, and the method significantly outperforms the state-of-the-art methods in the medical NER task, achieving value of 93.64% on CCKS2017 and 85.01% on CCKS2019.
computer science, information systems,telecommunications,engineering, electrical & electronic