Parallelization of butterfly counting on hierarchical memory
Zhibin Wang,Longbin Lai,Yixue Liu,Bing Shui,Chen Tian,Sheng Zhong
DOI: https://doi.org/10.1007/s00778-024-00856-x
2024-06-09
The VLDB Journal
Abstract:Butterfly (a cyclic graph motif) counting is a fundamental task with many applications in graph analysis, which aims at computing the number of butterflies in a large graph. With the rapid growth of graph data, it is more and more challenging to do butterfly counting due to the super-linear time complexity and large memory consumption. In this paper, we study I/O-efficient algorithms for doing butterfly counting on hierarchical memory. Existing algorithms of this kind cannot guarantee I/O optimality. Observing that in order to count butterflies, it suffices to "witness" a subgraph instead of the whole structure, a new class of algorithms called semi-witnessing algorithm is proposed. We prove that a semi-witnessing algorithm is not restricted by the lower bound of a witnessing algorithm, and give a new bound of . Subsequently, we develop the algorithm that manages to approach the I/O lower bound, and thus claim its optimality. Finally, we investigate the parallelization of to improve its performance and scalability. To support various hardware configurations, we introduce a general parallel framework, . Our analysis indicates that the key to implementing on multi-core CPUs lies in the fine-grained task division. Furthermore, we extend the CPU-tailored to harness the extensive parallelism that GPUs provide. Our experimental results show that performs better than established algorithms such as , - and - . Thanks to its I/O-efficient design, can handle large graphs that exceed the main memory capacity on both CPUs and GPUs. A significant result is that can manage butterfly counting on the Clueweb graph, which has 37 billion edges and quintillions ( ) of butterflies.
computer science, information systems, hardware & architecture