Statistical Chinese Chunking Model Based on Word Clustering Features

Guang-lu SUN,Xiao-long WANG,Bing-quan LIU,Yi GUAN
DOI: https://doi.org/10.3321/j.issn:0372-2112.2008.12.033
2008-01-01
Abstract:An entropy-based hierarchical word clustering algorithm is proposed.Word clusters generated by the clustering algorithm were used as features in Chinese chunking model.Based on words' chunk tags and the theory of entropy,a binary hierarchical clustering algorithm was applied to the words in Chinese chunking corpus.An accelerating algorithm was employed to save the clustering time.With the recognition of name entity and factoid,the new Chinese chunking system was constructed based on maximum entropy Markov models,while part-of-speech features were replaced with the entropy-based word clustering features.Experimental results show that the algorithm increases the efficiency of the word clustering,and the entropy-based word clustering features improve the performance of Chinese chunking effectively.
What problem does this paper attempt to address?