EndGraph: an Efficient Distributed Graph Preprocessing System

Tianfeng Liu,Dan Li
DOI: https://doi.org/10.1109/icdcs54860.2022.00020
2022-01-01
Abstract:Graph processing mainly includes two stages, namely, preprocessing and algorithm execution. Most previous proposals for performance enhancement of graph processing systems focus on the algorithm execution stage, and simple ignore the preprocessing overhead. However, in this work, we argue that the cost of preprocessing can not be ignored since the preprocessing time is much longer than the algorithm execution time in state-of-the-art systems.We propose EndGraph, a distributed graph preprocessing system, to improve preprocessing performance. Firstly, for graph partitioning, we find existing systems either assign imbalanced preprocessing workloads or spend too much time on graph partitioning. Hence, EndGraph proposes a novel chunk-based partition algorithm to balance preprocessing workloads and achieve theoretical lower bound of time complexity. Secondly, for graph construction (converting data layout from edge array to adjacency list), existing systems use counting sort, which is not efficient for computation and communication. EndGraph employs a novel two-level graph construction method by carefully decoupling the graph construction into intra-machine and inter-machine construction. Our extensive evaluation results show that, compared with five state-of-the-art systems, LFGraph, PowerLyra, PowerGraph, D-Galois, and Gemini, EndGraph can improve the preprocessing performance up to 35.76 ×(from 4.72×). To show the generality of EndGraph, we integrate it with D-Galois and Gemini, and it improves the end-to-end (including preprocessing and algorithm execution) graph processing performance up to 7.44× (from 2.96×).
What problem does this paper attempt to address?