Abstract:Named Entity Recognition (NER) systems have been largely advanced by deep neural networks in the recent decade. However, the state-of-the-arts on NER have been less applied to Chinese historical texts due to the lack of standard corpora in Chinese historical domains and the difficulty of accessing a quality ancient corpus. This paper addresses the respective issues and proposes an efficient automatic processing solution for tackling NER of ancient Chinese data, including the implementation of data-driven tagging and an innovative end-to-end network namely "MoGCN" (Mixture of Gated Convolutional Neural Network). A corpus consisting of three genres of Chinese historical classics is generated by our tagging approach, which is experimented for uncovering the generalization ability of proposed model. The empirical analysis demonstrates that our proposed model achieves the best results with above 1.5% F-1-score improvement over other sophisticated models in this dataset, where the experimental performance shows positive dependence on the quality of corpus. Furthermore, our model can perform much better on shorter entities especially for 2-charater ones, while many long-range entities can be only identified by our model based on our auxiliary attribute analysis. This work serves as a preliminary exploitation of NER for historical data, providing unique insights and reference values for similar tasks. Future work should be focused on more exploration about NER optimization on massive Chinese traditional texts with linguistic features and learning strategies.

A Sentence Segmentation Method for Ancient Chinese Texts Based on NNLM.

Ancient Chinese Sentence Segmentation Based on Bidirectional LSTM+CRF Model

Automatic sentence segmentation for classical Chinese: The Spring and Autumn Annals as an example

Citation Metadata Extraction Via Deep Neural Network-based Segment Sequence Labeling

CRF-based Approach to Sentence Segmentation and Punctuation for Ancient Chinese Prose

When Classical Chinese Meets Machine Learning: Explaining the Relative Performances of Word and Sentence Segmentation Tasks

Long Short-Term Memory Neural Networks for Chinese Word Segmentation.

A morphology-based Chinese word segmentation method

Improving Chinese Word Segmentation Using Partially Annotated Sentences

Classical Chinese Sentence Segmentation for Tomb Biographies of Tang Dynasty

Ancient Chinese Word Segmentation and Part-of-Speech Tagging Using Distant Supervision

Parsing-based Chinese word segmentation integrating morphological and syntactic information

Neural Word Segmentation Learning for Chinese

Bi-directional LSTM Recurrent Neural Network for Chinese Word Segmentation

Neural Chinese Word Segmentation with Dictionary Knowledge

MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts

Chinese Word Segmentation Without Using Lexicon and Hand-Crafted Training Data

Sentence Segmentation for Classical Chinese Based on LSTM with Radical Embedding

Automatic Translating Between Ancient Chinese and Contemporary Chinese with Limited Aligned Corpora.

Chinese Word Segmentation Via BiLSTM+Semi-CRF with Relay Node

Deep Learning for Chinese Word Segmentation and POS Tagging.