Chinese Document Categorization without Dictionary Support and Segmentation Processing

Shuigeng ZHOU,Jihong GUAN,Yunfa Hu
DOI: https://doi.org/10.3321/j.issn:1002-0470.2001.03.007
2001-01-01
Abstract:A new idea that utilizes the adjacent Chinese character-pairs information to categorize Chinese documents is proposed so that the classifiers can shake o ff the requirements of dictionaries and segmentation processing and subsequently be domain and time independent. Such a Chinese documents categorization system is implemented on the basis of Naive Bayes and kNN methods, and the experimental results show that it can achieve satisfying categorization performance.
What problem does this paper attempt to address?