Exploring the Hidden Dimension in Graph Processing.

Mingxing Zhang,Yongwei Wu,Kang Chen,Xuehai Qian,Xue Li,Weimin Zheng
DOI: https://doi.org/10.5555/3026877.3026900
2016-01-01
Abstract:Task partitioning of a graph-parallel system is traditionally considered equivalent to the graph partition problem. Such equivalence exists because the properties associated with each vertex/edge are normally considered indivisible. However, this assumption is not true for many Machine Learning and Data Mining (MLDM) problems: instead of a single value, a vector of data elements is defined as the property for each vertex/edge. This feature opens a new dimension for task partitioning because a vertex could be divided and assigned to different nodes. To explore this new opportunity, this paper presents 3D partitioning , a novel category of task partition algorithms that significantly reduces network traffic for certain MLDM applications. Based on 3D partitioning, we build a distributed graph engine CUBE. Our evaluation results show that CUBE outperforms state-of-the-art graph-parallel system PowerLyra by up to 4:7× (up to 7:3× speedup against PowerGraph).
What problem does this paper attempt to address?