Better Algorithms for Counting Triangles in Data Streams

Andrew McGregor,Sofya Vorotnikova,Hoa T. Vu
DOI: https://doi.org/10.1145/2902251.2902283
2016-06-15
Abstract:We present space-efficient data stream algorithms for approximating the number of triangles in a graph up to a factor 1+ε. While it can be shown that determining whether a graph is triangle-free is not possible in sub-linear space, a large body of work has focused on minimizing the space required in terms of the number of triangles T (or a lower bound on this quantity) and other parameters including the number of nodes n and the number of edges m. Two models are important in the literature: the arbitrary order model in which the stream consists of the edges of the graph in arbitrary order and the adjacency list order model in which all edges incident to the same node appear consecutively. We improve over the state of the art results in both models. For the adjacency list order model, we show that ~O(ε-2m/√T) space is sufficient in one pass and ~O(ε-2m3/2/T) space is sufficient in two passes where the ~O(·) notation suppresses log factors. For the arbitrary order model, we show that ~O(ε-2m/√T) space suffices given two passes and that ~O(ε-2m3/2/T) space suffices given three passes and oracle access to the degrees. Finally, we show how to efficiently implement the "wedge sampling" approach to triangle estimation in the arbitrary order model. To do this, we develop the first algorithm for lp sampling such that multiple independent samples can be generated with O(polylog n) update time; this primitive is widely applicable and this result may be of independent interest.
What problem does this paper attempt to address?