Distance-based outlier detection for high dimension, low sample size data

Jeongyoun Ahn,Myung Hee Lee,Jung Ae Lee
DOI: https://doi.org/10.1080/02664763.2018.1452901
IF: 1.416
2018-03-24
Journal of Applied Statistics
Abstract:Despite the popularity of high dimension, low sample size data analysis, there has not been enough attention to the sample integrity issue, in particular, a possibility of outliers in the data. A new outlier detection procedure for data with much larger dimensionality than the sample size is presented. The proposed method is motivated by asymptotic properties of high-dimensional distance measures. Empirical studies suggest that high-dimensional outlier detection is more likely to suffer from a swamping effect rather than a masking effect, thus yields more false positives than false negatives. We compare the proposed approaches with existing methods using simulated data from various population settings. A real data example is presented with a consideration on the implication of found outliers.
statistics & probability
What problem does this paper attempt to address?