Abstract:In order to process complex and large-scale graph data numerous distributed graph-parallel computing platforms have been proposed. However, excessive communications among computing nodes in these systems not only aggravate the network I/O workload of the underlying computing hardware systems but may also cause a decrease in runtime performance and scalability. In this paper, we propose and implement a system called Ligraph, which computes large-scale graph data in distributed mode with lightweight communication overhead. Ligraph is similar to PowerGraph system with three new features: (1) a Gather partial sum difference based computing model; (2) a corresponding lightweight Gather communication mechanism; (3) for PageRank-like algorithms Ligraph additionally employs a lightweight synchronizing communication mechanism and an edge direction-aware graph partition strategy proposed by our former work LightGraph, which is specially designed for PageRank-like algorithms. We have conducted extensive experiments using real-world data sets, and our results verified the effectiveness of Ligraph on reducing the communication overhead and improving the runtime performance and the scalability compared with PowerGraph and LightGraph. For example, compared with PowerGraph under Random partition scenario Ligraph can not only reduce up to 35.2 percent of the communication overhead but also cut up to 21.8 percent of the runtime for PageRank algorithm while processing Twitter data set. Our experiment results also demonstrate that compared with several other representative existing systems Ligraph also outperforms them in graph computing rate.

Efficient Subgraph Matching on Billion Node Graphs

How to Partition a Billion-Node Graph

Towards Distributed Node Similarity Search on Graphs

Top-k subgraph matching query in a large graph.

Answering Subgraph Queries over Massive Disk Resident Graphs

Big Graph Management Based on Scalable Computing Platforms

Approximate Subgraph Matching Query over Large Graph.

Efficient Processing of Very Large Graphs in a Small Cluster

SQBC: an Efficient Subgraph Matching Method over Large and Dense Graphs

Scaling Hop-Based Reachability Indexing for Fast Graph Pattern Query Processing

Holistic Subgraph Search Over Large Graphs

Subgraph Matching with Set Similarity in a Large Graph Database

Graphine: Programming Graph-Parallel Computation of Large Natural Graphs on Multicore Cluster

Distributed Reachability Queries on Massive Graphs

Distributed structural clustering on large graph

Evaluating Large Graph Processing in MapReduce Based on Message Passing

Subgraph Search over Massive Disk Resident Graphs

A Distributed Graph-Parallel Computing System with Lightweight Communication Overhead

High-Performance Massive Subgraph Counting Using Pipelined Adaptive-Group Communication.

Hybrid Subgraph Matching Framework Powered by Sketch Tree for Distributed Systems

Towards Efficient Distributed Subgraph Enumeration Via Backtracking-Based Framework