Enhancing Chinese Word Segmentation With Character Clustering

Yijia Liu,Wanxiang Che,Ting Liu
DOI: https://doi.org/10.1007/978-3-642-41491-6_6
2013-01-01
Abstract:In semi-supervised learning framework, clustering has been proved a helpful feature to improve system performance in NER and other NLP tasks. However, there hasn't been any work that employs clustering in word segmentation. In this paper, we proposed a new approach to compute clusters of characters and use these results to assist a character based Chinese word segmentation system. Contextual information is considered when we perform character clustering algorithm to address character ambiguity. Experiments show our character clusters result in performance improvement. Also, we compare our clusters features with widely used mutual information (MI). When two features integrated, further improvement is achieved.
What problem does this paper attempt to address?