KNN Classification with One-step Computation

Shichao Zhang,Jiaye Li
DOI: https://doi.org/10.1109/TKDE.2021.3119140
2021-11-19
Abstract:KNN classification is an improvisational learning mode, in which they are carried out only when a test data is predicted that set a suitable K value and search the K nearest neighbors from the whole training sample space, referred them to the lazy part of KNN classification. This lazy part has been the bottleneck problem of applying KNN classification due to the complete search of K nearest neighbors. In this paper, a one-step computation is proposed to replace the lazy part of KNN classification. The one-step computation actually transforms the lazy part to a matrix computation as follows. Given a test data, training samples are first applied to fit the test data with the least squares loss function. And then, a relationship matrix is generated by weighting all training samples according to their influence on the test data. Finally, a group lasso is employed to perform sparse learning of the relationship matrix. In this way, setting K value and searching K nearest neighbors are both integrated to a unified computation. In addition, a new classification rule is proposed for improving the performance of one-step KNN classification. The proposed approach is experimentally evaluated, and demonstrated that the one-step KNN classification is efficient and promising
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to solve several key problems in KNN classification, which are as follows: 1. **The problem of Lazy Learning**: KNN classification is a lazy learning method, that is, calculations are carried out only when predicting test data. This includes setting an appropriate K value and searching for K nearest neighbors from the entire training sample space. This laziness is partly the bottleneck problem in applying KNN classification, because searching for K nearest neighbors requires traversing the entire training set, with high computational complexity and being time - consuming. 2. **Unified calculation of K - value setting and nearest - neighbor search**: Traditional KNN classification algorithms are usually carried out in two steps: first, determine an appropriate K value, and then search for the nearest neighbors according to this K value. However, these two operations are carried out independently, increasing the computational complexity. This paper proposes a one - step calculation method, integrating the setting of the K value and the search for K nearest neighbors into a unified calculation step, thus simplifying the calculation process and improving efficiency. 3. **Improving classification performance**: In order to further improve the classification performance, the paper also proposes a new classification rule. By introducing a weighting matrix and using group lasso for sparse learning, the model can better capture the relationship between test data and training data, thereby improving the accuracy of classification. ### Specific solutions The solutions proposed in the paper mainly include the following aspects: - **Least - squares loss function fitting**: Given a test data, use the training samples to fit the test data through the least - squares loss function. - **Generating a relationship matrix**: Generate a relationship matrix according to the influence of the training samples on the test data. This matrix reflects the degree of their influence on the test data by weighting all training samples. - **Group lasso sparse learning**: Use group lasso to perform sparse learning on the relationship matrix, thereby achieving a sparse representation of the relationship matrix. This can effectively select the optimal K value and the corresponding nearest neighbors. - **New classification rule**: Propose a new classification rule, classify the nearest neighbors by weighting, thereby improving the accuracy of classification. ### Summary The main contribution of this paper lies in proposing a new non - lazy KNN algorithm, which simultaneously completes the K - value setting and the nearest - neighbor search through one - step calculation, thereby greatly reducing the computational complexity, and improving the classification performance by introducing a new classification rule. The experimental results show that this method is superior to the existing methods in both classification performance and running cost. I hope the above summary can help you understand the problems that this paper attempts to solve and the methods adopted. If you have more questions or need further explanation, please feel free to let me know!