Integrating N-gram Model Information for Chinese Word Segmentation Based on Conditional Random Fields.

Ying Xiong
DOI: https://doi.org/10.1109/icmlc.2012.6359021
2012-01-01
Abstract:This paper presents a Chinese word segmentation system based on conditional random fields, which integrates the result information of N-gram model as features of conditional random fields. Since dictionary-based N-gram model can deal with in-vocabulary words very well, while conditional random fields have the advantage of recognizing out-of-vocabulary words. This approach is evaluated using the PKU data from Sighan Bakeoff 2005. The experimental results have proven that this method achieved an F-measure of 95.0% and higher Roov (85.2%) and Riv (97.9%).
What problem does this paper attempt to address?