Chinese Text Categorization Without Word Segmentation Using String Kernel
YOU Zhi,LI Zhan-huai,ZHANG Yang
DOI: https://doi.org/10.3321/j.issn:1002-8331.2006.26.054
2006-01-01
Computer Engineering and Applications Journal
Abstract:Text Categorization is the first step to gain information from textual data,existing methods are mainly based on statistical method or machine learning,such as Bayes,KNN,SVM,Neural Network,which have proved to be accurate and stable in experiments for categorizing English texts.However,Chinese text categorization is much more difficult because there is no space between words,and word segmentation has always been used to solve this problem.This paper delivers a method using string kernel in support vector machine without word segmentation and the experiment reports a good result.