INTEGRATING KNN AND HIERARCHICAL SVM FOR AUTOMATIC TEXT CLASSIFICATION

Jinhua Wang,Hui Yu,Wen Chan,Xiangdong Zhou,Bole Shi
DOI: https://doi.org/10.3969/j.issn.1000-386x.2016.02.009
2016-01-01
Abstract:For automatic hierarchical classification of large-scale text,k-nearest neighbours (KNN)algorithm has higher classification efficiency but is not effective for classifying the samples on the borders of categories.The support vector machine (SVM)classification algorithms have higher accuracy,however a number of previous multi-class SVMalgorithms are composed of a number of independent binary classifiers,thus they become slower in training process and are not suitable for hierarchical category structures.This paper presents a new method which integrates both KNN and hierarchical SVM algorithm for automatic text classification.First we modify the KNN algorithm to quickly obtain K class labels of the nearest neighbours,and effectively sift out candidate categories of the documents with them.Then we use a multi-class sparse hierarchical SVMclassifier with uniform learning to make top-down categories partition on the sample,so that implement the efficient and accurate classification process on the documents.Experimental results demonstrate that the classification accuracy of this method on classification dataset with single-layer and multi-layer is better than just using either of the methods,meanwhile it is also close to the fastest single classifier in classification time.
What problem does this paper attempt to address?