The Weighted KNN Text Categorization Algorithm Based on Training Set Cutting
Xin SUN,Tong OUYANG,XiMin YAN,YuMing SHANG,WenHao GUO
DOI: https://doi.org/10.3772/j.issn.2095-915x.2016.05.002
2016-01-01
Abstract:Text categorization is one of the key research fields in the information retrieval. Feature selection is an important part in the document processing, and imposes great influence on the document classification. In this paper, an improved feature selection algorithm based on information gain was proposed to improve the accuracy of text feature selection effectively. Moreover, K-Nearest Neighbor (KNN) algorithm is used widely in text categorization, and the advantages of this method are high accuracy and stability.However, the number of training samples and their position may influence the classification performance of the KNN algorithm, thus we proposed the weighted KNN classification algorithm based on training set cutting, and the accuracy of the classification algorithm can be improved by the rough sets and the concept of membership function. Finally, this research tested the new algorithm based on the text categorization experiment, and the results indicated that the effectiveness of the proposed algorithm.