Unified Framework of Performing Chinese Word Segmentation and Part-Of-Speech Tagging

Zhang Kaixu,Sun Maosong
2012-01-01
China Communications
Abstract:The paper proposes a unified framework to combine the advantages of the fast one-at-a-time approach and the high-performance all-at-once approach to perform Chinese Word Segmentation (CWS) and Part-of-Speech (PoS) tagging. In this framework, the input of the PoS tagger is a candidate set of several CWS results provided by the CWS model. The widely used one-at-a-time approach and all-at-once approach are two extreme cases of the proposed candidate-based approaches. Experiments on Penn Chinese Treebank 5 and Tsinghua Chinese Treebank show that the generalized candidate-based approach outperforms one-at-a-time approach and even the all-at-once approach. The candidate-based approach is also faster than the time-consuming all-at-once approach. The authors compare three different methods based on sentence, words and character-intervals to generate the candidate set. It turns out that the word-based method has the best performance.
What problem does this paper attempt to address?