Abstract:SIAM Journal on Computing, Ahead of Print. The random geometric graph model [math] is a distribution over graphs in which the edges capture a latent geometry. To sample [math], we identify each of our [math] vertices with an independently and uniformly sampled vector from the [math]-dimensional unit sphere [math], and we connect pairs of vertices whose vectors are "sufficiently close," such that the marginal probability of an edge is [math]. Because of the underlying geometry, this model is natural for applications in data science and beyond. We investigate the problem of testing for this latent geometry, or, in other words, distinguishing an Erdős–Rényi graph [math] from a random geometric graph [math]. It is not too difficult to show that if [math] while [math] is held fixed, the two distributions become indistinguishable; we wish to understand how fast [math] must grow as a function of [math] for indistinguishability to occur. When [math] for constant [math], we prove that if [math], the total variation distance between the two distributions is close to 0; this improves upon the best previous bound of Brennan, Bresler, and Nagaraj (2020), which required [math], and further our result is nearly tight, resolving a conjecture of Bubeck, Ding, Eldan, and Rácz (2016) up to logarithmic factors. We also obtain improved upper bounds on the statistical indistinguishability thresholds in [math] for the full range of [math] satisfying [math], improving upon the previous bounds by polynomial factors. Our analysis uses the belief propagation algorithm to characterize the distributions of (subsets of) the random vectors conditioned on producing a particular graph. In this sense, our analysis is connected to the "cavity method" from statistical physics. To analyze this process, we rely on novel sharp estimates for the area of the intersection of a random sphere cap with an arbitrary subset of [math], which we prove using optimal transport maps and entropy-transport inequalities on the unit sphere. We believe these techniques may be of independent interest.

A semiparametric two-sample hypothesis testing problem for random dot product graphs

A Semiparametric Two-Sample Hypothesis Testing Problem for Random Graphs

Two-sample testing for random graphs

Hypothesis Testing of Matrix Graph Model with Application to Brain Connectivity Analysis.

Hypothesis testing for general network models

Hypothesis Testing for Topological Data Analysis

Lost in the Shuffle: Testing Power in the Presence of Errorful Network Vertex Labels

Collaborative non-parametric two-sample testing

Weighted Graph-Based Two-Sample Test via Empirical Likelihood

Hypothesis Testing for Network Data with Power Enhancement.

Multi-level hypothesis testing for populations of heterogeneous networks

Nonparametric High-Dimensional Multi-Sample Tests based on Graph Theory

Testing Thresholds for High-Dimensional Sparse Random Geometric Graphs

Network two-sample test for block models

Testing for Global Network Structure Using Small Subgraph Statistics

A Sampling-Based Framework for Hypothesis Testing on Large Attributed Graphs

Dimension constraints improve hypothesis testing for large-scale, graph-associated, brain-image data

Semisupervised regression in latent structure networks on unknown manifolds

A Global Homogeneity Test for High-Dimensional Linear Regression

A More Powerful Two-Sample Test in High Dimensions using Random Projection

MATRIX GRAPH HYPOTHESIS TESTING AND APPLICATION IN BRAIN CONNECTIVITY ALTERNATION DETECTION