GGraph: an Efficient Structure-Aware Approach for Iterative Graph Processing
Beibei Si,Yuxuan Liang,Jin Zhao,Yu Zhang,Xiaofei Liao,Hai Jin,Haikun Liu,Lin Gu
DOI: https://doi.org/10.1109/tbdata.2020.3019641
2020-01-01
IEEE Transactions on Big Data
Abstract:Many iterative graph processing systems have recently been developed to analyze graphs. Although they are effective from different aspects, there is an important issue that has not been addressed yet. A real-world graph follows the power-law property, in which a small number of vertices have high degrees (i.e., are connected to most other vertices in the graph). These vertices are called hot-vertices and usually require more iterations to converge. In the existing solutions, these hot-vertices may be allocated to many or even all graph partitions along with other vertices that are easy to converge. As the result, the partitions with hot-vertices have to be loaded repeatedly (and consequently the system suffers from high data access cost), although perhaps only a few vertices in these partitions are active. To cope with this issue, we develop an efficient open source graph partition manager, called GGraph, which can be integrated into the existing graph processing systems to efficiently support iterative graph processing, by taking into account the power-law property of the graph structure. It uses a novel graph repartitioning scheme with low overhead to dynamically partition the hot-vertices together, so as to avoid loading the inactive vertices in the same partition as the repeatedly processed hot-vertices. By such means, it not only enables less data access cost, but also enables the privileged processing of the hot-vertices. In order to further increase the convergence speed, a scheduling algorithm is further proposed in this work to prioritize the processing of the hot-vertices with low overhead. To demonstrate the efficiency of GGraph, we plug it into four state-of-the-art graph processing systems, i.e., Gemini, GraphChi, Chaos, and GridGraph, and experimental results show that GGraph improves their performance by up to 3.2 times, 3.8 times, 3.9 times, 3.5 times, respectively.