Resolution to Chinese Combinational Ambiguity Combined Corpus-Based Method with Linguistics Knowledge

JiangYang Liu,Ying Liu
DOI: https://doi.org/10.1109/fskd.2010.5569209
2010-01-01
Abstract:Combinational ambiguity is a challenging issue in Chinese word segmentation in that its disambiguation depends on the contextual information. This paper collects contextual information of 28 typical combinational ambiguity strings, and makes use of lexical, syntactic and semantic knowledge and large scale corpus to summarize the rules of these combinational ambiguity strings. Using these rules to test “People's Daily” Corpus of 1996, we find that the average precision rate is improved from 80.65% to 94.95%. The result shows that using rules is effective for disambiguation.
What problem does this paper attempt to address?