Parsing-based Chinese word segmentation integrating morphological and syntactic information
Xihong Wu,Meng Zhang,Xiaojun Lin
DOI: https://doi.org/10.1109/NLPKE.2011.6138178
2011-01-01
Abstract:The conventional sequence labeling methods for Chinese word segmentation do not fully utilize the linguistic information, which restricts further improvements of the performance. Chinese morphology intensively investigates the constructions and usages of Chinese words, which is helpful to Chinese word segmentation. Furthermore, some word segmentation ambiguities cannot be resolved only by means of the lexical information, and the final disambiguations take place in the parsing process. In this paper, we propose a parsing-based Chinese word segmentation model, which can fully utilize the morphological and syntactic information. Experiments on Penn Chinese Treebank(CTB) 5.0 show that the proposed model obtains competitive performances as the CRFs-based model. To investigate the relationship between our parsing-based model and the CRFs-based model, a maximum entropy model based framework for integrating different knowledge sources is employed. The integrating model obtains an F-measure of 97.9, 25% in segmentation error rate reduction relative to the CRFs-based model, which indicates that the two models are complementary to each other.