An Improved KNN Text Categorization Method Based on Spanning Tree Documents Clustering

Zheng Wei,Feng Guo-He,Nan Zheng
DOI: https://doi.org/10.1109/itap.2011.6006411
2011-01-01
Abstract:For the shortcoming that K-Nearest Neighbor(KNN) classification method is not efficient and it is difficult to determined the optimal parameter value K, a new KNN classification method based on spanning tree document clustering is presented. The basic idea is that using the clustering algorithm based on spanning tree to realize automatic clustering, each sub-tree generated retain a few core document nodes after a few nodes is cut and the core nodes retained have been merged into a new document. When the experiment of classification is carried out, the similarity of document test is computed with center document of sub-tree and the category of document test is the category of sub-tree that it has largest similarity. Experiments show that proposed method is better than KNN in stability of classification ,meanwhile it improve the classification speed and avoid the choice of value of the parameter k.
What problem does this paper attempt to address?