A Kind of Self-Constructed Category Dictionary in Chinese Text Classification
kun zhou,ya ping dai,feng gao,ji hong zou
DOI: https://doi.org/10.4028/www.scientific.net/AMM.644-650.2206
2014-01-01
Applied Mechanics and Materials
Abstract:By means of word-segmentation technology in TRIP database and each word that appears in a database will be account in detail, a kind of self-constructed category dictionary (SCC-dictionary) in Chinese text classification is proposed. For solving high dimension and sparseness problem exit in vector space model, a four-dimensional feature vector space model (FFVSM) is presented in this paper. With Support Vector Machine (SVM) algorithm, the text classifier is designed. Experimental results show there are two achievements in this paper: first, SCC-dictionary can replace the artificial-written dictionary with the same effect; second, the FFVSM will not only reduce the computing load than high-dimensional feature vector space model, but also keep the precision of classification as 86.87%, recall rate as 95.12%, and F1 value as 90.81%.