Nonparametric High-Dimensional Multi-Sample Tests based on Graph Theory

Xiaoping Shi
DOI: https://doi.org/10.1080/10618600.2024.2358156
2024-06-19
Journal of Computational and Graphical Statistics
Abstract:High-dimensional data pose unique challenges for data processing in an era of ever-increasing amounts of data availability. Graph theory can provide a structure of high-dimensional data. We introduce two key properties desirable for graphs in testing homogeneity. Roughly speaking, these properties may be described as: unboundedness of edge counts under the same distribution and boundedness of edge counts under different distributions. It turns out that the minimum spanning tree violates these properties but the shortest Hamiltonian path posses them. Based on the shortest Hamiltonian path, we propose two combinations of edge counts in multiple samples to test for homogeneity. We give the permutation null distributions of proposed statistics when sample sizes go to infinity. The power is analyzed by assuming both sample sizes and dimensionality tend to infinity. Simulations show that our new tests behave very well overall in comparison with various competitors. Real data analysis of tumors and images further convince the value of our proposed tests. Software implementing the test is available in the R package GRelevance. Supplemental materials for this article are available online.
statistics & probability
What problem does this paper attempt to address?