DiterGraph: Toward I/O-Efficient Incremental Computation over Large Graphs with Billion Edges.

Yujie Du,Zhigang Wang,Ning Wang,Luqing Xie,Zhiqiang Wei
DOI: https://doi.org/10.1109/bigcom53800.2021.00006
2021-01-01
Abstract:The growing demand for iterative computation over large-scale graphs has attracted a lot of enthusiasm. Distributed-disk systems can accommodate the high-level scalability requirement as graphs grow in size, but the computation is greatly expensive due to a large number of communications and a high frequency of random data-accesses. Alleviating the two limiting factors pose great challenges for graph partitioning, disk-oriented data management and the iterative mechanism. This paper derives insights from the natural locality of raw graphs and then proposes a lightweight partitioning algorithm GPNL with the goal of balancing load and accelerating communication. Accordingly, a hybrid index RC-Index is proposed to improve the I/O-efficiency by reducing disk-accesses for graph data and message data. We also introduce an across-iteration mechanism (AIM) based on the extended BSP model, and then design two policies AIMP and AIMC to prune the message scale and accelerate the message-spreading respectively. Comprehensive experiments versus the state-of-the-art solutions demonstrate significant performance gains over a broad spectrum of real-world and synthetic graphs with up to billion edges.
What problem does this paper attempt to address?