Distributed Graph Layout for Scalable Small-world Network Analysis

George M Slota,Sivasankaran Rajamanickam,Kamesh Madduri
DOI: https://doi.org/10.48550/arXiv.1701.00503
2017-01-03
Abstract:The in-memory graph layout or organization has a considerable impact on the time and energy efficiency of distributed memory graph computations. It affects memory locality, inter-task load balance, communication time, and overall memory utilization. Graph layout could refer to partitioning or replication of vertex and edge arrays, selective replication of data structures that hold meta-data, and reordering vertex and edge identifiers. In this work, we present DGL, a fast, parallel, and memory-efficient distributed graph layout strategy that is specifically designed for small-world networks (low-diameter graphs with skewed vertex degree distributions). Label propagation-based partitioning and a scalable BFS-based ordering are the main steps in the layout strategy. We show that the DGL layout can significantly improve end-to-end performance of five challenging graph analytics workloads: PageRank, a parallel subgraph enumeration program, tuned implementations of breadth-first search and single-source shortest paths, and RDF3X-MPI, a distributed SPARQL query processing engine. Using these benchmarks, we additionally offer a comprehensive analysis on how graph layout affects the performance of graph analytics with variable computation and communication characteristics.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the performance impact of graph layout (or organization) on large - scale small - world network analysis in a distributed memory environment. Specifically, the paper focuses on how to improve the time and energy efficiency of graph computing through effective graph layout strategies. Graph layout strategies involve the partitioning or replication of vertex and edge arrays, the selective replication of metadata structures, and the re - ordering of vertex and edge identifiers. These issues are particularly important for processing small - world networks with low diameter and skewness distribution, because these characteristics can lead to load imbalance in graph computing, increased communication time, and low memory utilization efficiency. The paper proposes a fast, parallel, and memory - efficient distributed graph layout strategy named DGL (Distributed Graph Layout), which is especially suitable for small - world networks. The DGL strategy is mainly partitioned based on the label propagation algorithm and adopts a sorting method based on breadth - first search (BFS). Through experimental verification, DGL can significantly improve the overall performance of five challenging graph analysis workloads, including PageRank, parallel subgraph enumeration programs, optimized implementation of breadth - first search and single - source shortest - path algorithms, and the distributed SPARQL query processing engine RDF3X - MPI. The paper also analyzes in detail how graph layout affects the performance of graph analysis with different computing and communication characteristics, providing a comprehensive understanding of the impact of graph layout on graph computing performance.