Abstract:MapReduce has become one of the most popular parallel computing paradigms in cloud, due to its high scalability, reliability, and fault-tolerance achieved for a large variety of applications in big data processing. In the literature, there are MapReduce Class MRC and Minimal MapReduce Class MMC to define the memory consumption, communication cost, CPU cost, and number of MapReduce rounds for an algorithm to execute in MapReduce. However, neither of them is designed for big graph processing in MapReduce, since the constraints in MMC can be hardly achieved simultaneously on graphs and the conditions in MRC may induce scalability problems when processing big graph data. In this paper, we study scalable big graph processing in MapReduce. We introduce a Scalable Graph processing Class SGC by relaxing some constraints in MMC to make it suitable for scalable graph processing. We define two graph join operators in SGC, namely, EN join and NE join, using which a wide range of graph algorithms can be designed, including PageRank, breadth first search, graph keyword search, Connected Component (CC) computation, and Minimum Spanning Forest (MSF) computation. Remarkably, to the best of our knowledge, for the two fundamental graph problems CC and MSF computation, this is the first work that can achieve O(log(n)) MapReduce rounds with $O(n+m)$ total communication cost in each round and constant memory consumption on each machine, where $n$ and $m$ are the number of nodes and edges in the graph respectively. We conducted extensive performance studies using two web-scale graphs Twitter and Friendster with different graph characteristics. The experimental results demonstrate that our algorithms can achieve high scalability in big graph processing.

Towards Scalable Subgraph Pattern Matching over Big Graphs on MapReduce.

Distributed Affinity Propagation Clustering Based on MapReduce

Scalable Subgraph Enumeration in MapReduce: a Cost-Oriented Approach

Scalable Big Graph Processing in MapReduce

Efficient Algorithms for Summarizing Graph Patterns

Evaluating Large Graph Processing in MapReduce Based on Message Passing

Parallel Algorithms for Flexible Pattern Matching on Big Graphs

Efficient subgraph similarity all-matching

Efficiently extracting frequent subgraphs using MapReduce

A Survey and Experimental Analysis of Distributed Subgraph Matching

Combination of in-memory graph computation with mapreduce: a subgraph-centric method of pagerank

Efficient Subgraph Matching on Billion Node Graphs

A Parallel Computing Model for Large-Graph Mining with MapReduce.

GraphPar: Efficient Workload-Aware Subgraph Matching System on Multiple GPUs

Hybrid Subgraph Matching Framework Powered by Sketch Tree for Distributed Systems

Tuning the granularity of parallelism for distributed graph processing

Distributed Subgraph Matching on Timely Dataflow [Experiments and Analyses]

Distributed Subgraph Matching on Timely Dataflow.

RDF Subgraph Query Based on Common Subgraph in Distributed Environment

Distributed structural clustering on large graph

GraphPi: High Performance Graph Pattern Matching through Effective Redundancy Elimination