SVM-based Hybrid Pattern for New Word Discovery

Hui Yang,Yuejie Zhang,Tao Zhang
DOI: https://doi.org/10.1109/NLPKE.2007.4368013
2007-01-01
Abstract:New words bring more challenges into Chinese word segmentation. This paper presents a SVM-based hybrid pattern for new word discovery, trying to integrate the advantages of the statistics-based method and the rule-based method to improve the performance of the new word discovery. In the statistics module, new words discovery is defined as a binary classification problem, in which we considered the previous new word features and proposed context information and affix information as new features, as well as constraints, which reveal the relationships among the new word candidates. Finally, some rules are introduced aimed to improve the performance. In the experiment, some new words are simulated by revising the dictionary of a Natural Language Processing (NLP) system. The results show that these features and constraints are useful for new word discovery, and the F-measure is 64.62% which is 7% higher than the baseline.
What problem does this paper attempt to address?