A Chinese named entity recognition model: integrating label knowledge and lexicon information

Yihan Yuan,Qinghua Zhang,Xiong Zhou,Man Gao
DOI: https://doi.org/10.1007/s13042-024-02207-2
2024-05-18
International Journal of Machine Learning and Cybernetics
Abstract:Chinese named entity recognition (CNER) is one of the important tasks in the field of information extraction. And different divisions of CNER for text processing units can be generally classified into character granularity and word granularity. These two approaches are not only limited by the applicable scenarios, but also susceptible to ambiguity, errors or out of vocabulary. In addition, the direct formalization of entity identification into question answering questions does not take full advantage of the knowledge information of the labels. Therefore, a CNER model incorporating label knowledge and lexicon information (LkLi-CNER) is proposed in this paper. The model first integrates lexical enhancement information directly into the BERT layer for full interaction by matching sentences with lexicons on a character-based basis. And then a priori knowledge is introduced to fuse the representation of label description text into the enhanced text representation, so that the model can be further enhanced by learning semantic information from the entity labels themselves. Finally, the probability of being the start and end of each category is calculated for each token, and the start-end group with the highest probability is selected as the output. The experimental results show that the LkLi-CNER model is significantly better than baseline, and good results are achieved simultaneously on four CNER datasets in different fields, which proves the effectiveness of the proposed model.
computer science, artificial intelligence
What problem does this paper attempt to address?