Nearly Tight Bounds on Testing of Metric Properties

Yiqiao Bao,Sampath Kannan,Erik Waingarten
2024-11-14
Abstract:Given a non-negative $n \times n$ matrix viewed as a set of distances between $n$ points, we consider the property testing problem of deciding if it is a metric. We also consider the same problem for two special classes of metrics, tree metrics and ultrametrics. For general metrics, our paper is the first to consider these questions. We prove an upper bound of $O(n^{2/3}/\epsilon^{4/3})$ on the query complexity for this problem. Our algorithm is simple, but the analysis requires great care in bounding the variance on the number of violating triangles in a sample. When $\epsilon$ is a slowly decreasing function of $n$ (rather than a constant, as is standard), we prove a lower bound of matching dependence on $n$ of $\Omega (n^{2/3})$, ruling out any property testers with $o(n^{2/3})$ query complexity unless their dependence on $1/\epsilon$ is super-polynomial. Next, we turn to tree metrics and ultrametrics. While there were known upper and lower bounds, we considerably improve these bounds showing essentially tight bounds of $\tilde{O}(1/\epsilon )$ on the sample complexity. We also show a lower bound of $\Omega ( 1/\epsilon^{4/3} )$ on the query complexity. Our upper bounds are derived by doing a more careful analysis of a natural, simple algorithm. For the lower bounds, we construct distributions on NO instances, where it is hard to find a witness showing that these are not ultrametrics.
Discrete Mathematics,Data Structures and Algorithms
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: given a non - negative \(n\times n\) matrix, how to efficiently test whether this matrix forms a metric space, and whether more efficient tests can be carried out for two special metrics - tree metrics and ultrametrics. Specifically, the main contributions of the paper include: 1. **Testing of general metric spaces**: - Proposed a non - adaptive algorithm with a query complexity of \(O(n^{2/3}/\varepsilon^{4/3})\) for testing whether any \(n\times n\) matrix is a metric space. - Proved that in some cases, the query complexity needs to be at least \(\Omega(n^{2/3})\), unless the dependence on \(1/\varepsilon\) is super - polynomial. 2. **Testing of tree metrics and ultrametrics**: - For tree metrics and ultrametrics, improved the previous upper and lower bounds, and proposed algorithms with sample complexity \(\tilde{O}(1/\varepsilon)\) and query complexity \(\tilde{O}(1/\varepsilon^{2})\). - Proved that the lower bound of the sample complexity is \(\Omega(1/\varepsilon)\) and the lower bound of the query complexity is \(\Omega(1/\varepsilon^{4/3})\). ### Key technical points - **Upper bound of general metric spaces**: - Check whether the triangle inequality is violated by randomly selecting points and pairs of points. - The key in the analysis is to estimate the number of triangles that violate the triangle inequality and control their variance. - **Lower bound of general metric spaces**: - Using the Behrend graph construction in graph theory, it is proved that the non - adaptive algorithm requires \(\Omega(n^{2/3})\) queries in the worst - case. - **Upper bound of tree metrics and ultrametrics**: - Through more refined analysis, the number of required samples and queries is reduced. - Utilize the special properties of tree metrics and ultrametrics to make it easier to detect violated conditions. - **Lower bound of tree metrics and ultrametrics**: - Construct a distribution to maximize the probability of detecting violated conditions, thereby proving the lower bounds of sample complexity and query complexity. ### Significance of the paper This paper not only systematically studies the efficient testing problem of metric spaces for the first time, but also provides more efficient algorithms for the testing of tree metrics and ultrametrics. These results are not only of theoretical significance, but can also be used as a pre - processing step to help quickly screen out matrices that do not meet the metric conditions, thereby reducing the computational burden of subsequent complex algorithms.