Chinese Documents Categorization Based on N-gram Information

Shui-geng ZHOU,Hong-qi YU,Yun-fa HU,Ji-hong Guan
DOI: https://doi.org/10.3969/j.issn.1003-0077.2001.01.005
2001-01-01
Abstract:Traditional document classifiers are based on keywords in the documents, which need dictionaries support and efficient segmentation procedures. This paper explores the problem of utilizing N-gram information to categorize Chinese documents so that the classifiers can shake off the burden of large dictionaries and complex segmentation procedures,and subsequently be domain and time independent. Such a Chinese documents categorization system is implemented with kNN classification method. Experimental results show that it can achieve comparable performance to other classifiers of the same type.
What problem does this paper attempt to address?