A Classification-based Algorithm for Consistency Check of Part-of-Speech Tagging for Chinese Corpora.

Hu Zhang,Jia-heng Zheng,Ying Zhao
2005-01-01
Abstract:Ensuring consistency of Part-of-Speech (POS) tagging plays an important role in constructing high-quality Chinese corpora. After analyzing the POS tagging of multi-category words in largescale corpora, we propose a novel consistency check method of POS tagging in this paper. Our method builds a vector model of the context of multicategory words, and uses the k-NN algorithm to classify context vectors constructed from POS tagging sequences and judge their consistency. The experimental results indicate that the proposed method is feasible and effective.
What problem does this paper attempt to address?