Abstract:k nearest neighbor (kNN) method is a popular classification method in data mining and statistics because of its simple implementation and significant classification performance. However, it is impractical for traditional kNN methods to assign a fixed k value (even though set by experts) to all test samples. Previous solutions assign different k values to different test samples by the cross validation method but are usually time-consuming. This paper proposes a kTree method to learn different optimal k values for different test/new samples, by involving a training stage in the kNN classification. Specifically, in the training stage, kTree method first learns optimal k values for all training samples by a new sparse reconstruction model, and then constructs a decision tree (namely, kTree) using training samples and the learned optimal k values. In the test stage, the kTree fast outputs the optimal k value for each test sample, and then, the kNN classification can be conducted using the learned optimal k value and all training samples. As a result, the proposed kTree method has a similar running cost but higher classification accuracy, compared with traditional kNN methods, which assign a fixed k value to all test samples. Moreover, the proposed kTree method needs less running cost but achieves similar classification accuracy, compared with the newly kNN methods, which assign different k values to different test samples. This paper further proposes an improvement version of kTree method (namely, k*Tree method) to speed its test stage by extra storing the information of the training samples in the leaf nodes of kTree, such as the training samples located in the leaf nodes, their kNNs, and the nearest neighbor of these kNNs. We call the resulting decision tree as k*Tree, which enables to conduct kNN classification using a subset of the training samples in the leaf nodes rather than all training samples used in the newly kNN methods. This actually reduces running cost of test stage. Finally, the experimental results on 20 real data sets showed that our proposed methods (i.e., kTree and k*Tree) are much more efficient than the compared methods in terms of classification tasks.

Ordinal semi-supervised k-nearest neighbor algorithm for small training datasets

A Semi-Supervised Approach Based on K-Nearest Neighbor.

Unsupervised Feature Selection with Ordinal Locality.

Instance-Ranking: A New Perspective to Consider the Instance Dependency for Classification

2-Stage Instance Selection Algorithm for KNN Based on Nearest Unlike Neighbors

Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data

Under-bagging Nearest Neighbors for Imbalanced Classification

Introduction to Machine Learning: K-Nearest Neighbors

Incremental learning algorithm for large-scale semi-supervised ordinal regression

A New k-Nearest Neighbors Algorithm for Learning from Multiple Experts' Uncertain Decisions

Efficient Knn Classification with Different Numbers of Nearest Neighbors.

Neighbor selection for multilabel classification

An Adaptive K-Nearest Neighbor Algorithm

k-nearest neighbor classification method for class-imbalanced problem

An ensemble-driven k-NN approach to ill-posed classification problems

A Graph-Based Semi-Supervised K Nearest-Neighbor Method for Nonlinear Manifold Distributed Data Classification.

Hybrid k-Nearest Neighbor Classifier.

A k-Nearest Neighbor Medoid-Based Outlier Detection Algorithm

Distributionally Robust Weighted $k$-Nearest Neighbors

A new two-layer nearest neighbor selection method for kNN classifier

Efficient Estimation of k for the Nearest Neighbors Class of Methods