An integrated approach to chinese word segmentation and part-of-speech tagging

Maosong Sun,Dongliang Xu,Benjamin K. Tsou,Huaming Lu
DOI: https://doi.org/10.1007/11940098_31
2006-01-01
Abstract:This paper discusses and compares various integration schemes of Chinese word segmentation and part-of-speech tagging in the framework of true-integration and pseudo-integration. A true-integration approach, named ‘the divide-and-conquer integration', is presented. The experiments based on a manually word-segmented and part-of-speech tagged corpus with about 5.8 million words show that this true integration achieves 98.61% F-measure in word segmentation, 95.18% F-measure in part-of-speech tagging, and 93.86% F-measure in word segmentation and part-of-speech tagging, outperforming all other kinds of combinations to some extent. The experimental results demonstrate the potential for further improving the performance of Chinese word segmentation and part-of-speech tagging.
What problem does this paper attempt to address?