FABLE: Approximate Butterfly Counting in Bipartite Graph Stream with Duplicate Edges

Guozhang Sun,Yuhai Zhao,Yuan Li
DOI: https://doi.org/10.1145/3627673.3679812
2024-01-01
Abstract:Bipartite graph models the relationship between two different sets of entities. Such graph data become more dynamic and are organized as stream with duplicate edges in real-word applications such as customer-product in e-commerce. A butterfly, (2,2)-biclique, is the simplest cohesive substructure and of great importance in a bipartite graph. However, it is challenging to estimate the number of butterflies in large scale and high dynamic bipartite graph stream when given a limited memory. Besides, existing works for butterfly counting assume no duplicate edges in the bipartite graph stream, which cause less accuracy in bipartite graph stream with duplicate edges. In this paper, we propose FABLE, a Fixed-size memory Approximate Butterfly counting algorithm for dupLicate Edges in bipartite graph stream. In FABLE, we compute the number of distinct edges by maintaining an ordered list of edge priorities for replacement and sampling. We provide theoretical proof of unbiasedness and derive the variance of butterfly count. Our extensive experiments on 5 real-world datasets confirm that our approach has higher accuracy compared with the baseline method under the same memory usage.
What problem does this paper attempt to address?