Compact Estimator for Streaming Triangle Counting

Jiqing Gu,Chao Song,Haipeng Dai,Li Lu,Ming Liu
DOI: https://doi.org/10.1109/tkde.2024.3371228
IF: 9.235
2024-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Streaming triangle counting is a critical issue in graph stream mining, with applications in dense subgraph discovery, web mining, anomaly detection, and more. Recent efforts have focused on estimating triangle counts in graph streams, primarily through sampling methods. However, because of limited memory resources for handling high speed streams, traditional sampling methods suffer from reduced sampling rate and thereby performance loss. In this paper, we propose a new compact data structure called uHLL to process edge streams by considering the tradeoff between estimation accuracy and memory efficiency. Furthermore, different from conventional triangle counting algorithms, we solve the estimation of union set cardinality for edge-local triangle count under both centralized and distributed framework, so as to efficiently estimate the global triangle count by a one-pass streaming algorithm. To the best of our knowledge, this is the first implementation of a distributed framework using a compact data structure for streaming triangle counting. We provide theoretical proof of unbiasedness and derive the variance of the union set and global triangle count. We compare our scheme with 11 algorithms, showing that under the same experimental setting, uHLL and distributed uHLL are at least 2.3 and 1.7 times more accurate than the state-of-the-art, respectively.
What problem does this paper attempt to address?