Integrated Chinese Word Segmentation and Part-of-speech Tagging Based on the Divide-and-conquer Strategy

MS Sun,DL Xu,BK Tsou
DOI: https://doi.org/10.1109/nlpke.2003.1275978
2003-01-01
Abstract:In this paper, various ways of integration of Chinese word segmentation and part-of-speech tagging, including the so-called true-integration and pseudo-integration, are tested and compared based on a test corpus consisting of 367,114 Chinese characters. A novel true-integration approach, named 'the divide-and-conquer integration', is originally proposed. Preliminary experiments show that this true integration achieves 98.72% accuracy of word segmentation, 95.65% accuracy of part-of-speech tagging, and 94.43% accuracy of word segmentation and part-of-speech tagging, outperforming all other kinds of combinations to some extent (though not very significant). The results demonstrate the potential for further improving the performance of Chinese word segmentation and part-of-speech tagging.
What problem does this paper attempt to address?