A morphology-based Chinese word segmentation method

Xiaojun Lin,Liang Zhao,Meng Zhang,Xihong Wu
DOI: https://doi.org/10.1109/NLPKE.2010.5587786
2010-01-01
Abstract:This paper proposes a novel method of Chinese word segmentation utilizing morphology information. The method introduces morphology into statistical model to capture structural relationship within word. It improves the conventional Conditional Random Fields (CRFs) models on the ability of representing the structure information. Firstly, a word-segmented Chinese corpus is annotated with morphology tags by a semi-automatic method. The resulting structure-related tags are integrated into the CRFs model. Secondly, a joint CRFs model is trained, which generates both morphology tags and word boundaries. Experiments are carried out on several SIGHAN Bakeoff corpus and show that the morphology information can improve the performance of Chinese word segmentation significantly, especially for the segmentation of out-of-vocabulary words.
What problem does this paper attempt to address?