When Computing Meets Heterogeneous Cluster: Workload Assignment in Graph Computation
Jilong Xue,Zhi Yang,Shian Hou,Yafei Dai
DOI: https://doi.org/10.1109/bigdata.2015.7363752
2015-01-01
Abstract:In order to process very large graphs, existing graph processing systems, such as Pregel and Giraph, usually partition and distribute the graph computation on large number of nodes (i.e., workers). However, due to the heterogeneity of computing clusters (e.g., nodes with various bandwidth or CPU resource), blindly increasing the number of workers for a job may even degrade the overall performance. In this paper, we address the question of how to distribute the graph computation over the heterogeneous cluster to maximize performance. Based on the practical constraints of current systems, we address this problem in two scenarios. For systems using hash-based partition method (for avoiding the overhead of indexing and searching vertex), we propose a coarse-grained mechanism to greedily select suitable worker set to execute the job. For systems allowing arbitrary graph partition, we further propose a heterogeneity-aware streaming graph partitioning model that can assign workload in fine-grained level. We implement the scheduling mechanisms as a general middleware which can be easily adopted in existing graph computing systems. Our experiments on both university lab cluster (46 machines) and EC2 cluster (100 instances) show that, the proposed framework can significantly improve the execution performance. Compared with the default configurations (i.e., using the whole set of workers and hash-based graph partition), our framework can reduce the overall execution time by 55.9% for lab cluster and 44.7% for EC2 cluster respectively.