Abstract:MapReduce has become one of the most popular parallel computing paradigms in cloud, due to its high scalability, reliability, and fault-tolerance achieved for a large variety of applications in big data processing. In the literature, there are MapReduce Class MRC and Minimal MapReduce Class MMC to define the memory consumption, communication cost, CPU cost, and number of MapReduce rounds for an algorithm to execute in MapReduce. However, neither of them is designed for big graph processing in MapReduce, since the constraints in MMC can be hardly achieved simultaneously on graphs and the conditions in MRC may induce scalability problems when processing big graph data. In this paper, we study scalable big graph processing in MapReduce. We introduce a Scalable Graph processing Class SGC by relaxing some constraints in MMC to make it suitable for scalable graph processing. We define two graph join operators in SGC, namely, EN join and NE join, using which a wide range of graph algorithms can be designed, including PageRank, breadth first search, graph keyword search, Connected Component (CC) computation, and Minimum Spanning Forest (MSF) computation. Remarkably, to the best of our knowledge, for the two fundamental graph problems CC and MSF computation, this is the first work that can achieve O(log(n)) MapReduce rounds with $O(n+m)$ total communication cost in each round and constant memory consumption on each machine, where $n$ and $m$ are the number of nodes and edges in the graph respectively. We conducted extensive performance studies using two web-scale graphs Twitter and Friendster with different graph characteristics. The experimental results demonstrate that our algorithms can achieve high scalability in big graph processing.

Implementing quasi-parallel breadth-first search in MapReduce for large-scale social network mining

Large-Scale Social Network Analysis Based on MapReduce

Evaluating Large Graph Processing in MapReduce Based on Message Passing

Community structure mining in big data social media networks with MapReduce

A Parallel Computing Model for Large-Graph Mining with MapReduce.

A Parallel Community Structure Mining Method In Big Social Networks

Maximal Influence Spread for Social Network Based on MapReduce

Research on method for extracting large-scale social network based on Mapreduce

LI-MR: A Local Iteration Map/Reduce Model and Its Application to Mine Community Structure in Large-Scale Networks

Scalable Big Graph Processing in MapReduce

Distributed Centrality Analysis of Social Network Data Using MapReduce

A MapReduce and Information Compression Based Social Community Structure Mining Method

Efficiently extracting frequent subgraphs using MapReduce

Scalable Community Discovery of Large Networks

Combination of in-memory graph computation with mapreduce: a subgraph-centric method of pagerank

A Parallel And Scalable Framework For Non-Overlapping Community Detection Algorithms

Visual analysis of retweeting propagation network in a microblogging platform

Enumerating Maximal Bicliques from a Large Graph using MapReduce

Parallelized Similarity Flooding Algorithm for Processing Large Scale Graph Datasets with MapReduce

Parallelizing the Extraction of Fresh Information from Online Social Networks

Parallel Algorithm for Discovering Communities in Large-Scale Complex Networks