The Communication Complexity of Set Intersection and Multiple Equality Testing.

Dawei Huang,Seth Pettie,Yixiang Zhang,Zhijun Zhang
DOI: https://doi.org/10.1137/20m1326040
2021-01-01
SIAM Journal on Computing
Abstract:In this paper we explore fundamental problems in randomized communication complexity such as computing Set Intersection on sets of size k and Equality Testing between vectors of length k. Brody et al. [BCK(+)16] and Saglam and Tardos [ST13] showed that for these types of problems, one can achieve optimal communication volume of O(k) bits, with a randomized protocol that takes O (log* k) rounds. They also proved [BCK(+)16, ST13] that this is one point along the optimal round-communication tradeoff curve. Aside from rounds and communication volume, there is a third parameter of interest, namely the error probability p(err). It is straightforward to show that protocols for Set Intersection or Equality Testing need to send Omega(k + log p(err)(-1)) bits. Is it possible to simultaneously achieve optimality in all three parameters, namely O(k + log p(err)(-1)) communication and O(log* k) rounds? In this paper we prove that there is no universally optimal algorithm, and complement the existing round-communication tradeoffs [BCK(+)16, ST13] with a new tradeoff between rounds, communication, and probability of error. In particular: Any protocol for solving Multiple Equality Testing in r rounds with failure probability p(err) = 2(-E) has communication volume Omega(Ek(1/r)). There exists a protocol for solving Multiple Equality Testing in r + log*(k/E) rounds with O(k + rEk(1/r)) communication, thereby essentially matching our lower bound and that of [BCK+ 16, ST13]. Lower bounds on Equality Testing extend to Set Intersection, for every r; k; and p(err) (which is trivial); in the reverse direction, upper bounds on Equality Testing for r; k; p(err) imply similar upper bounds on Set Intersection with parameters r + 1; k; and p(err). Our original motivation for considering p(err) as an independent parameter came from the problem of enumerating triangles in distributed (CONGEST) networks having maximum degree Delta. We prove that this problem can be solved in O (Delta/log n + log log Delta) time with high probability 1 - 1/poly(n). This beats the trivial (deterministic) O(Delta)-time algorithm and is superior to the (O) over tilde (n(1/3)) algorithm of [CPZ19, CS19] when Delta = (O) over tilde (n(1/3)).
What problem does this paper attempt to address?