A Two-Phase Method to Balance the Result of Distributed Graph Repartitioning
He Li,Jianbin Huang,Hang Yuan,Jiangtao Cui,Xiaoke Ma,Shaojie Qiao,Xindong Wu
DOI: https://doi.org/10.1109/TBDATA.2021.3070194
2022-01-01
IEEE Transactions on Big Data
Abstract:With the increase in popularity of graph structured data arising in different areas such as Web, social network, communication network, knowledge graph, etc., there is a growing need for partitioning and repartitioning large graph data in a distributed system. However, the existing graph repartitioning methods are known for poor efficiency in the distributed environment and most of them lack a balance mechanism between edge cut and load balance. In this article, we introduce a new two-phase method to improve the result of distributed graph repartitioning. We first design a local method to identify all the potential candidate vertices that could improve the graph repartitioning result in load balance and edge cut at once in each partition locally. After that, we propose to migrate the selected vertices among the given initial partitions to improve the result of graph repartitioning. During this procedure, we propose to adopt a synchronous vertex migration method to balance both the edge cuts and load balance problems. Extensive experimental results demonstrate that the proposed method is more efficient than the existing methods in several aspects such as communication cost, running time, edge cut, and load balance. We also run SSSP and PageRank applications based on the graph repartitioning result on Giraph to indicate the efficiency of the proposed method.