Applying rough sets in word segmentation disambiguation based on maximum entropy model

Wei Jiang,WANG Xiao-long,Yi Guan,LIANG Guo-hua,姜维,王晓龙,关毅,梁国华
2006-01-01
Abstract:To solve the complicated feature extraction and long distance dependency problem in Word Segmentation Disambiguation (WSD), this paper proposes to apply rough sets in WSD based on the Maximum Entropy model. Firstly, rough set theory is applied to extract the complicated features and long distance features, even from noise or inconsistent corpus. Secondly, these features are added into the Maximum Entropy model, and consequently, the feature weights can be assigned according to the performance of the whole disambiguation model. Finally, the semantic lexicon is adopted to build class-based rough set features to overcome data sparseness. The experiment indicated that our method performed better than previous models, which got top rank in WSD in 863 Evaluation in 2003. This system ranked first and second respectively in MSR and PKU open test in the Second International Chinese Word Segmentation Bakeoff held in 2005.
What problem does this paper attempt to address?