Abstract:Nearest neighbor classifier is arguably the most simple and popular nonparametric classifier available in the literature. However, due to the concentration of pairwise distances and the violation of the neighborhood structure, this classifier often suffers in high-dimension, low-sample size (HDLSS) situations, especially when the scale difference between the competing classes dominates their location difference. Several attempts have been made in the literature to take care of this problem. In this article, we discuss some of these existing methods and propose some new ones. We carry out some theoretical investigations in this regard and analyze several simulated and benchmark datasets to compare the empirical performances of proposed methods with some of the existing ones.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the performance degradation of the nearest neighbor classifier in the case of high - dimensional low - sample - size (HDLSS), especially when the scale differences between different classes are greater than the location differences. Specifically: 1. **Concentration of distances in high - dimensional data**: In high - dimensional space, the distances between different points tend to concentrate, which leads to the destruction of the neighbor structure of the nearest neighbor classifier, thus affecting its classification effect. For example, in high dimensions, even if the distribution centers of two classes are significantly different, due to the influence of scale differences, the nearest neighbor classifier may not be able to effectively distinguish these classes. 2. **Limitations of existing methods**: Although some improved methods have been proposed in the existing literature to deal with this problem, these methods still have deficiencies in certain specific situations (such as scale problems). For example, the scale adjustment method (CH classifier) proposed by Chan and Hall performs well in dealing with location - scale problems, but its performance is still poor when dealing with cases where only the scale is different. 3. **Proposing new solutions**: The paper proposes an improved scale - adjusted nearest neighbor classifier (Modified Chan and Hall classifier, MCH classifier), and verifies its superior performance in high - dimensional data in multiple experiments. In addition, the paper also explores the classification method based on the minimum - distance feature (MDist classifier), which performs well in dealing with complex situations such as mixed distributions. Through these methods, the paper aims to improve the classification performance of the nearest neighbor classifier in the case of high - dimensional low - sample - size, especially when there are significant scale differences between different classes.

On high-dimensional modifications of the nearest neighbor classifier

On high-dimensional modifications of the nearest neighbor classifier

Discriminant adaptive nearest neighbor classification and regression

Accelerating Exact Nearest Neighbor Search in High Dimensional Euclidean Space Via Block Vectors

A depth-based nearest neighbor algorithm for high-dimensional data classification

Efficient Approximate Algorithms for the Closest Pair Problem in High Dimensional Spaces.

A Modified Nearest Neighbor Classification Approach Based on Class-Wise Local Information

Doubly Approximate Nearest Neighbor Classification

Classification algorithm based on near neighbor interval of dimensional samples

In defense of Nearest-Neighbor based image classification

Making the nearest neighbor meaningful for time series classification

K-Ns: A Classifier by the Distance to the Nearest Subspace.

A Novel Separating Hyperplane Classification Framework to Unify Nearest-Class-Model Methods for High-Dimensional Data

Minimizing Nearest Neighbor Classification Error for Nonparametric Dimension Reduction.

Large-margin Nearest Neighbor Classifiers Via Sample Weight Learning.

A Novel Two-Level Nearest Neighbor Classification Algorithm Using an Adaptive Distance Metric

Perceptual Nearest Neighbors for Classification

Efficiently Learning a Distance Metric for Large Margin Nearest Neighbor Classification.

Improving classifier decision boundaries using nearest neighbors

On high-dimensional modifications of some graph-based two-sample tests

Nonlinear Nearest Subspace Classifier