Chinese named entity recognition via word boundary based character embedding

Lin YAO,Yi LIU,Xinxin LI,Hong LIU
DOI: https://doi.org/10.11992/tis.201507065
2016-01-01
Abstract:Most Chinese named entity recognition systems based on machine learning are realized by applying a large amount of manual extracted features. Feature extraction is time?consuming and laborious. In order to remove the dependence on feature extraction, this paper presents a Chinese named entity recognition system via word boundary based character embedding. The method can automatically extract the feature information from a large number of unlabeled data and generate the word feature vector, which will be used in the training of neural network. Since the Chinese characters are not the most basic unit of the Chinese semantics, the simple word vector will be cause the semantics ambiguity problem. According to the same character on different position of the word might have different meanings, this paper proposes a character vector method with word boundary information, constructs a depth neural network system for the Chinese named entity recognition and achieves F1 89.18% on Sighan Bakeoff?3 2006 MSRA corpus. The result is closed to the state?of?the?art performance and shows that the system can avoid rel?ying on feature extraction and reduce the character ambiguity.
What problem does this paper attempt to address?