Towards Scalable Subgraph Pattern Matching over Big Graphs on MapReduce.

Bo Suo,Zhanhuai Li,Qun Chen,Wei Pan
DOI: https://doi.org/10.1109/icpads.2016.0147
2016-01-01
Abstract:Big graph-structured data pervade our world, ranging from microworld such as gene regulatory networks to macroworld such as social networks. Subgraph matching is a fundamental operation for many graph applications, such as graph database and graph mining. However, existing sequential algorithms have limited applicability on large graphs because of the inherent NP-completeness of subgraph isomorphism and distributed graph storage. Therefore, there is a need to parallelize subgraph matching over big graph data in a distributed environment. With MapReduce as the backdrop, this paper proposes a new approach, named ParMa, for efficient subgraph matching on distributed platforms. It consists of alternate computation and communication phases. We first build a cost model and then propose approaches to optimize the execution process. Instead of existing parallel approaches which only considers intermediate result size, the proposed cost model takes the number of iteration invocations as the primary cost. Based on this, our optimizations mainly focus on the aspects that affects iteration number throughout the execution of matching. One is query decomposition. We propose an effective query decomposition approach to minimize the number of subqueries and their matches. The other is join processing. We introduce a suite of mechanisms, including join plan making, local join processing and join cost estimation, to join partial matches in an appropriate way to reduce its cost. Finally, our extensive experiments on both synthetic and real graphs demonstrated that ParMa outperforms the state-of-the-art solutions by considerable margins.
What problem does this paper attempt to address?