Increasing Accuracy of Chinese Segmentation with Strategy of Multi-step Processing
Tie-jun ZHAO,Ya-juan LV,Hao YU,Mu-yun YANG,Fang LIU
DOI: https://doi.org/10.3969/j.issn.1003-0077.2001.01.002
2001-01-01
Abstract:The automatic word segmentation of Chinese sentences is difficult when the processing mechanism faces large-scale real texts. The crucial two issues in Chinese segmentation are the identification of unknown words and the disambiguation of segmentation strings. This paper describes a strategy based on multi-steps processing for decreasing the difficulties and improving the accuracy of the segmentation. The processing steps include seven parts, i. e., disambiguation of pseudo-ambiguities, full segmentation of a sentence, determinate segmentation for some words, processing of numeral string, processing for reduplication of words, statistical identification for unknown words and final correction for segmentation ambiguities with part-of-speech which is integrated in the tagger. The output of this procedure is promising with above 98% accuracy in opentest.