BHP:BSP Model Oriented Hash Graph Data Partition with Load Balancing

Shuang ZHOU,Yubin BAO,Zhigang WANG,Fangling LENG,Ge YU,Chao DENG,Leitao GUO
DOI: https://doi.org/10.3778/j.issn.1673-9418.1305052
2014-01-01
Abstract:Graph data partition is a critical technical problem in the large-scale graph processing system based on bulk synchronous parallel (BSP) model. Traditional graph partition technology requires many iterations, the time com-plexity is too high, and the partition results don’t have the mapping information from vertexes to partitions. So those algorithms are not suitable for partitioning graph data for the BSP model based systems. This paper presents a new Hash partition algorithm that implements load balance, called balanced Hash partition (BHP). In order to achieve the balance of the number of out degrees in each partition as much as possible, the concept of virtual bucket is introduced. Then these virtual buckets are reorganized into desired partitions by using a greedy algorithm. In this way, the algorithm can ensure the load balance of each partition. At the same time, the data localization strategy makes the data on the split locate on the corresponding node as much as possible. So, the overhead of data migration during the data loading can be reduced. This paper does some experiments to compare the performance between BHP and the classic Hash algorithm from multiple aspects. The results show that the BHP algorithm can improve the job executing efficiency,reduce the number of sending messages, and effectively solve the problems of load imbalance and too many interac-tive edges among partitions.
What problem does this paper attempt to address?