k-nearest neighbor classification method for class-imbalanced problem
Huaping GUO,Jun ZHOU,Chang'an WU,Ming FAN
DOI: https://doi.org/10.11772/j.issn.1001-9081.2017092181
2018-01-01
Journal of Computer Applications
Abstract:To improve the performance of k-Nearest Neighbor (kNN) model on class-imbalanced data,a new kNN classification algorithm was proposed.Different from the traditional kNN,for the learning process,the majority set was partitioned into several clusters by using partitioning method (such as K-Means),then each cluster was merged with the minority set as a new training set to train a kNN model,therefore a classifier hbrary was constructed consisting of serval kNN models.For the prediction,using a partitioning method (such as K-Means),a model was selected from the classifier library to predict the class category of a sample.By this way,it is guaranteed that the kNN model can efficiently discover local characteristics of the data,and also fully consider the effect of imbalance of the data on the performance of the classifier.Besides,the efficiency of kNN was also effectively promoted.To further enhance the performance of the proposed algorithm,Synthetic Minority Over-sampling TEchnique (SMOTE) was applied to the proposed algorithm.Experimental results on KEEL data sets show that the proposed algorithm effectively enhances the generalization performance of kNN method on evaluation measures of recall,g-mean,f-measure and Area Under ROC Curve (AUC) on majority set partitioned by random partition strategy,and it also shows great superiority to other state-of-the-art methods.