A Normality Test for High-dimensional Data Based on the Nearest Neighbor Approach

Hao Chen,Yin Xia
DOI: https://doi.org/10.1080/01621459.2021.1953507
IF: 4.369
2021-08-31
Journal of the American Statistical Association
Abstract:Many statistical methodologies for high-dimensional data assume the population is normal. Although a few multivariate normality tests have been proposed, to the best of our knowledge, none of them can properly control the Type I error when the dimension is larger than the number of observations. In this work, we propose a novel nonparametric test that uses the nearest neighbor information. The proposed method guarantees the asymptotic Type I error control under the high-dimensional setting. Simulation studies verify the empirical size performance of the proposed test when the dimension grows with the sample size and at the same time exhibit a superior power performance of the new test compared with alternative methods. We also illustrate our approach through two popularly used datasets in high-dimensional classification and clustering literatures where deviation from the normality assumption may lead to invalid conclusions.
statistics & probability
What problem does this paper attempt to address?