Clustering of high-dimensional observations

Yong Wang,Reza Modarres
DOI: https://doi.org/10.1080/10485252.2024.2378904
2024-07-26
Journal of Nonparametric Statistics
Abstract:We present a novel clustering method for high-dimensional, low sample size (HDLSS) data. The method is distance-based, takes advantage of the distance concentration phenomenon and the limiting values of the dissimilarity indices to construct clusters. We describe an algorithm that orders each row of the dissimilarity matrix to estimate the change points, which define cluster boundaries. We construct an agreement matrix of the Rand indices of the row clusters. The minimum of the row sum of the agreement matrix provides us with the best clusters. We prove that the new method achieves perfect clustering as the number of features diverges for a fixed sample size. Several examples are presented to illustrate the proposed method. We compare the new method with four other clustering techniques, including high-dimensional k -means, minimal spanning tree and Hierarchical Scan. The clustering methods are applied to the Lymphoma data set.
statistics & probability
What problem does this paper attempt to address?