Abstract:Abstract With rapid development of the Internet, people have undergone tremendous changes in the way they obtain information. In recent years, knowledge graph is becoming a popular tool for the public to acquire knowledge. For knowledge graph of Chinese history and culture, most researchers adopted traditional named entity recognition methods to extract entity information from unstructured historical text data. However, the traditional named entity recognition method has certain defects, and it is easy to ignore the association between entities. To extract entities from a large amount of historical and cultural information more accurately and efficiently, this paper proposes one named entity recognition model combining Bidirectional Encoder Representations from Transformers and Bidirectional Long Short-Term Memory-Conditional Random Field (BERT-BiLSTM-CRF). First, a BERT pre-trained language model is used to encode a single character to obtain a vector representation corresponding to each character. Then one Bidirectional Long Short-Term Memory (BiLSTM) layer is applied to semantically encode the input text. Finally, the label with the highest probability is output through the Conditional Random Field (CRF) layer to obtain each character’s category. This model uses the Bidirectional Encoder Representations from Transformers (BERT) pre-trained language model to replace the static word vectors trained in the traditional way. In comparison, the BERT pre-trained language model can dynamically generate semantic vectors according to the context of words, which improves the representation ability of word vectors. The experimental results prove that the model proposed in this paper has achieved excellent results in the task of named entity recognition in the field of historical culture. Compared with the existing named entity identification methods, the precision rate, recall rate, and $$F_1$$ F 1 value have been significantly improved.

AnchiBERT: A Pre-Trained Model for Ancient Chinese Language Understanding and Generation

AnchiBERT: A Pre-Trained Model for Ancient ChineseLanguage Understanding and Generation

GujiBERT and GujiGPT: Construction of Intelligent Information Processing Foundation Language Models for Ancient Texts

ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information

A complex network approach to analyse pre-trained language models for ancient Chinese

Guwen-UNILM: Machine Translation Between Ancient and Modern Chinese Based on Pre-Trained Models

Can Large Language Model Comprehend Ancient Chinese? A Preliminary Test on ACLUE

Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models

SikuGPT: A Generative Pre-trained Model for Intelligent Information Processing of Ancient Texts from the Perspective of Digital Humanities

Chinese Named Entity Recognition Method in History and Culture Field Based on BERT

AC-EVAL: Evaluating Ancient Chinese Language Understanding in Large Language Models

A cross-temporal contrastive disentangled model for ancient Chinese understanding

Ancient Text Translation Model Optimized with GujiBERT and Entropy-SkipBERT

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Towards Effective Ancient Chinese Translation: Dataset, Model, and Evaluation

Ancient-Modern Chinese Translation with a Large Training Dataset

What does Chinese BERT learn about syntactic knowledge?

RoChBert: Towards Robust BERT Fine-tuning for Chinese

StyleBERT: Chinese pretraining by font style information