Implementing Chinese new word discovery and POS tagging based on support vector machine

Yuejie Zhang,Hui Yang,Tao Zhang
2009-01-01
Journal of Computational Information Systems
Abstract:This paper proposes a SVM-based unified mechanism for Chinese New Word Discovery and POS Tagging. Both New Word Discovery and POS Tagging are defined as a binary classification problem, in which many previous morphological features are considered. Some new features are introduced, such as affix information and context information, as well as some constraints. Some rules are also added to improve the identification performance. The experimental results show that these features, constraints and rules are all useful for New Word Discovery. The F-measure is 64.58%, which is 6.82% higher than the predefined baseline. Based on the similar processing pattern, the precision of 90.81% for POS Tagging of new words can also be achieved. 1553-9105/ Copyright © 2009 Binary Information Press.
What problem does this paper attempt to address?