How unsupervised learning affects character tagging based Chinese Word Segmentation: A quantitative investigation

Yan Song,Chunyu Kit,Ruifeng Xu,Hai Zhao
DOI: https://doi.org/10.1109/ICMLC.2009.5212769
2009-01-01
Abstract:Integrating global information of unsupervised segmentation into Conditional Random Fields (CRF) learning has been proved effective to enhance the performance of the character tagging based Chinese Word Segmentation. By comparing CRF models with and without unsupervised learning enhancement, we investigate how unsupervised learning affects the performance. Especially, two kinds of segmented words, in-vocabulary and out-of-vocabulary words, are separately analyzed case by case to see what part of those words are affected by unsupervised learning. In addition, the cost of the additional features derived from unsupervised segmentation are also taken into account and evaluated.
What problem does this paper attempt to address?