Abstract:Big graph-structured data pervade our world, ranging from microworld such as gene regulatory networks to macroworld such as social networks. Subgraph matching is a fundamental operation for many graph applications, such as graph database and graph mining. However, existing sequential algorithms have limited applicability on large graphs because of the inherent NP-completeness of subgraph isomorphism and distributed graph storage. Therefore, there is a need to parallelize subgraph matching over big graph data in a distributed environment. With MapReduce as the backdrop, this paper proposes a new approach, named ParMa, for efficient subgraph matching on distributed platforms. It consists of alternate computation and communication phases. We first build a cost model and then propose approaches to optimize the execution process. Instead of existing parallel approaches which only considers intermediate result size, the proposed cost model takes the number of iteration invocations as the primary cost. Based on this, our optimizations mainly focus on the aspects that affects iteration number throughout the execution of matching. One is query decomposition. We propose an effective query decomposition approach to minimize the number of subqueries and their matches. The other is join processing. We introduce a suite of mechanisms, including join plan making, local join processing and join cost estimation, to join partial matches in an appropriate way to reduce its cost. Finally, our extensive experiments on both synthetic and real graphs demonstrated that ParMa outperforms the state-of-the-art solutions by considerable margins.

Frequent Subgraph Mining in Graph Databases Based on MapReduce.

Efficiently extracting frequent subgraphs using MapReduce

Mining Frequent Subgraphs from Tremendous Amount of Small Graphs Using MapReduce.

Using MapReduce Platform to Achieve Efficient Parallel Mining of Frequent Subgraphs

Efficient And Scalable Mining Of Frequent Subgraphs Using Distributed Graph Processing Systems

Single Large-Scale Graph Frequent Subgraph Algorithm Based on Spark

An Efficient Distributed Subgraph Mining Algorithm in Extreme Large Graphs

Towards Scalable Subgraph Pattern Matching over Big Graphs on MapReduce.

Extracting Frequent Connected Subgraphs from Large Graph Sets.

Frequent Subgraph Mining in Dynamic Databases.

SPIN: mining maximal frequent subgraphs from graph databases.

Parallel Graph Pattern Matching in Massive Networks Based on MapReduce

JPMiner: Mining Frequent Jump Patterns from Graph Databases.

An Efficient Frequent Subgraph Mining Algorithm

Survey of Frequent Subgraph Mining

Research of improved mining frequent subgraph patterns in uncertain graph databases

Distributed Mining of Frequent Patterns in Big Data by Hybrid Strategies.

Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

Mining Frequent Neighborhood Patterns in a Large Labeled Graph

Mining Frequent Neighborhood Patterns in Large Labeled Graphs

Efficient Dense Structure Mining Using MapReduce