Named Entity Recognition for Chinese Novels in the Ming-Qing Dynasties.

Yunfei Long,Dan Xiong,Qin Lu,Minglei Li,Chu-Ren Huang
DOI: https://doi.org/10.1007/978-3-319-49508-8_34
2016-01-01
Abstract:This paper presents a Named Entity Recognition (NER) system for Chinese classic novels in the Ming and Qing dynasties using the Conditional Random Fields (CRFs) method. An annotated corpus of four influential vernacular novels produced during this period is used as both training and testing data. In the experiment, three novels are used as training data and one novel is used as the testing data. Three sets of features are proposed for the CRFs model: (1) baseline feature set, that is, word/POS and bigram for different window sizes, (2) dependency head and dependency relationship, and (3) Wikipedia categories. The F-measures for these four books range from 67% to 80%. Experiments show that using the dependency head and relationship as well as Wikipedia categories can improve the performance of the NER system. Compared with the second feature set, the third one can produce greater improvement.
What problem does this paper attempt to address?