ENER: Named Entity Recognition Model for Ethnic Ancient Books Based on Entity Boundary Detection.

Lifeng Zhao,Ziquan Feng,Na Sun,Yong Lu
DOI: https://doi.org/10.1007/978-3-031-51671-9_4
2024-01-01
Abstract:Due to the significant differences between the entity identification rules in the field of ethnic ancient books and the existing methods, the general model has poor accuracy in identifying specific terms in the field entity extraction task and fails to effectively solve the problems of ambiguity and nesting of Chinese entities by using boundary information. In this paper, we construct a small-scale named entity corpus of ethnic ancient books and propose an Ethnic Naming Entity Recognition (ENER) model integrating entity boundary detection. In ENER, BERT model is used to pre-train the corpus of ancient book text annotation, Bidirectional Gate Recurrent Unit (BiGRU) encodes the contextual features of ancient books. Conditional Random Field (CRF) adds an auxiliary task of entity boundary detection based on named entity identification task to enhance model's ability to identify entity boundaries and generates the named entity tag sequence of ancient books. Experiments on the corpus of ancient books named entities and other general Chinese data sets show the effectiveness of our approach. On the one hand, ENER has improved the accuracy, recall and F1 value by 2.09%, 1.62% and 1.85% respectively. Compared with the baseline BERT-BiLSTM-CRF model and achieved higher indicators than other models. On the other hand, ENER shows better effect on the recognition of ancient book named entities in small-scale corpus and it is also stable on Chinese general data sets. It can be applied in dealing with text containing specific terms in the ethnic field and promoted to more tasks in the future.
What problem does this paper attempt to address?