Abstract:We present a novel data-driven strategy to choose the hyperparameter $k$ in the $k$-NN regression estimator without using any hold-out data. We treat the problem of choosing the hyperparameter as an iterative procedure (over $k$) and propose using an easily implemented in practice strategy based on the idea of early stopping and the minimum discrepancy principle. This model selection strategy is proven to be minimax-optimal over some smoothness function classes, for instance, the Lipschitz functions class on a bounded domain. The novel method often improves statistical performance on artificial and real-world data sets in comparison to other model selection strategies, such as the Hold-out method, 5-fold cross-validation, and AIC criterion. The novelty of the strategy comes from reducing the computational time of the model selection procedure while preserving the statistical (minimax) optimality of the resulting estimator. More precisely, given a sample of size $n$, if one should choose $k$ among $\left\{ 1, \ldots, n \right\}$, and $\left\{ f^1, \ldots, f^n \right\}$ are the estimators of the regression function, the minimum discrepancy principle requires the calculation of a fraction of the estimators, while this is not the case for the generalized cross-validation, Akaike's AIC criteria, or Lepskii principle.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the method of selecting the hyper - parameter $k$ in the $k - NN$ regression estimator without using any hold - out data. Specifically, the author proposes a new strategy based on the Minimum Discrepancy Principle (MDP) to iteratively select $k$, and reduces the computation time through the early - stopping technique while maintaining the optimality of statistical performance. ### Background and Motivation In non - parametric regression, the theoretical performance of the $k - NN$ regression estimator has been widely studied since the 1970s. However, selecting an appropriate value of $k$ remains a challenge. Common methods include cross - validation (such as 5 - fold cross - validation), the AIC criterion, etc., but these methods usually need to calculate the estimators corresponding to all possible values of $k$, which is computationally very expensive, especially in the case of large amounts of data. ### Proposed Method This paper proposes a new data - driven strategy that uses the Minimum Discrepancy Principle to select the value of $k$. The main features of this method include: - **No need for hold - out data**: Traditional cross - validation methods need to divide the data set into a training set and a test set, while the method in this paper is selected entirely based on the training data. - **Early - stopping**: By monitoring the change in empirical risk, the iteration is stopped when the empirical risk starts to fit the noise, thus avoiding over - fitting. - **High computational efficiency**: Compared with other methods (such as generalized cross - validation, the AIC criterion, etc.), this method only needs to calculate some estimators, greatly reducing the computation time. ### Theoretical Results The author proves that the proposed Minimum Discrepancy Principle strategy is statistically optimal on some smooth function classes (such as the Lipschitz function class). Specifically, for a given sample size $n$, if $k$ needs to be selected from $\{1,\ldots,n\}$, the Minimum Discrepancy Principle requires calculating a part of the estimators, while generalized cross - validation, the AIC criterion, or the Lepskii principle requires calculating all estimators. ### Experimental Results The experimental results show that the proposed method is generally superior to other model selection strategies, such as 5 - fold cross - validation, the Hold - out method, and generalized cross - validation, on both artificial data sets and real data sets. In addition, this method also significantly reduces the computation time of the model selection process. ### Key Contributions 1. **New strategy**: A new data - driven strategy based on the Minimum Discrepancy Principle and the early - stopping technique is proposed for selecting the hyper - parameter $k$ in $k - NN$ regression. 2. **Theoretical guarantee**: It is proved that this strategy is statistically optimal on some function classes. 3. **Computational efficiency**: Compared with traditional methods, this method significantly reduces the computation time. 4. **Practical application**: Experiments on multiple data sets prove the effectiveness and superiority of this method. ### Conclusion This paper proposes a new data - driven strategy for selecting the hyper - parameter $k$ in $k - NN$ regression, and proves its effectiveness and superiority both theoretically and experimentally. This method not only performs excellently in statistical performance but also has a significant advantage in computational efficiency.

Minimum discrepancy principle strategy for choosing $k$ in $k$-NN regression

Minimax Optimal Algorithms with Fixed-$k$-Nearest Neighbors

Efficient Estimation of k for the Nearest Neighbors Class of Methods

Regression with reject option and application to kNN

Minimum Kernel Discrepancy Estimators

Fast Computation of Leave-One-Out Cross-Validation for $k$-NN Regression

kNN Algorithm for Conditional Mean and Variance Estimation with Automated Uncertainty Quantification and Variable Selection

Improving the Predictive Performances of $k$ Nearest Neighbors Learning by Efficient Variable Selection

Introduction to Machine Learning: K-Nearest Neighbors

Bayesian Model Selection Methods for Mutual and Symmetric $k$-Nearest Neighbor Classification

Extrapolation Towards Imaginary $0$-Nearest Neighbour and Its Improved Convergence Rate

What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis?

Distributionally Robust Weighted $k$-Nearest Neighbors

Learning K for Knn Classification

Early stopping and polynomial smoothing in regression with reproducing kernels

Optimizing $k$ in $k$NN Graphs with Graph Learning Perspective

A solution to minimum sample size for regressions

KNN Ensembles for Tweedie Regression: The Power of Multiscale Neighborhoods

Scalable $k$-NN graph construction.

A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice

Variable Selection and Minimax Prediction in High-dimensional Functional Linear Model