BlockGraphChi: Enabling Block Update in Out-of-Core Graph Processing
Zhiyuan Shao,Zhenjie Mei,Xiaofeng Ding,Hai Jin
DOI: https://doi.org/10.1007/s10766-017-0532-z
2017-01-01
International Journal of Parallel Programming
Abstract:In the past several years, lots of out-of-core graph processing systems are built to process big graph datasets in computer systems with limited main memory. Due to the iterative nature of graph algorithms, most of these systems employ synchronous execution model to organize the computation, i.e., divide the computing into multiple rounds, each of which corresponds to one iteration of the graph algorithm. In order to fully utilize the disk bandwidth, these systems sequentially scan the whole graph dataset at each iteration. However, as the graph dataset under processing may be huge, more iterations generally means larger I/O overheads. Although asynchronous implementation of the synchronous execution model allows message passing within an iteration, the effectiveness is still limited. Since in such model, at most one message is allowed to be passed from one vertex to another. In this paper, we investigate the idea of block updating in the synchronous execution model framework in the out-of-core graph processing systems. With this new model, the system conducts graph algorithm on the loaded subgraph (i.e., block) to its local convergence, and then switches to other subgraphs to continue this process, until global convergence is reached. We implement this new model in GraphChi (the result system is called BlockGraphChi), and propose a companion graph partition method, named as DMLP. By this study, we found that compared with the original execution model of GraphChi: (1) the new model can generally reduce the amount of iterations (and thus the I/O overheads) for graph algorithms, while the extent of reduction depends on the method of graph partitioning and the properties of the algorithms; (2) the new model can dramatically reduce the overall execution time of graph traversal algorithms (by up to 31.4 \(\times \)), and better partitioning method leads to better performance; (3) the new model has much smaller effectiveness on improving the overall performance of fix-point algorithms, such as PageRank, due to the increased computational overhead.