A Refined Hdp-Based Model For Unsupervised Chinese Word Segmentation

Wenzhe Pei,Dongxu Han,Baobao Chang
DOI: https://doi.org/10.1007/978-3-642-41491-6_5
2013-01-01
Abstract:This paper proposes a refined Hierarchical Dirichlet Process (HDP) model for unsupervised Chinese word segmentation. This model gives a better estimation of the base measure in HDP by using a dictionary-based model. We also show that the initial segmentation state for HDP model plays a very important role in model performance. A better initial segmentation can lead to a better performance. We test our model on PKU and MSRA datasets provided by Second Segmentation Bake-off (SIGHAN 2005) [1] and our model outperforms the state-of-the-art systems.
What problem does this paper attempt to address?