Abstract:We investigate the problem of summarizing frequent subgraphs by a smaller set of representative patterns. We show that some special graph patterns, called δ-jump patterns in this paper, must be representative patterns. Based on the fact, we devise two algorithms, RP-FP and RP-GD, to mine a representative set that summarizes frequent subgraphs. RP-FP derives a representative set from frequent closed subgraphs, whereas RP-GD mines a representative set from graph databases directly. Three novel heuristic strategies, Last-Succeed-First-Check, Reverse-Path-Trace, and Nephew-Representative-Based-Cover, are proposed to further improve the efficiency of RP-GD. RP-FP can provide a tight ratio bound but has heavy computation cost. RP-GD cannot provide a ratio bound guarantee but is more efficient than RP-FP. We also make use of the similarity between sibling branches in the graph pattern space to devise another much more efficient algorithm, RP-Leap, for mining a representative set that can approximately summarize frequent subgraphs. Our extensive experiments on both real and synthetic data sets verify the summarization quality and efficiency of our algorithms. To further demonstrate the interestingness of representative patterns, we study an application of representative patterns to classification. We demonstrate that the classification accuracy achieved by representative pattern-based model is no less than that achieved by closed graph pattern-based model.

Efficiently extracting frequent subgraphs using MapReduce

Distributed Affinity Propagation Clustering Based on MapReduce

Efficient Algorithms for Summarizing Graph Patterns

Evaluating Large Graph Processing in MapReduce Based on Message Passing

Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

A Parallel Computing Model for Large-Graph Mining with MapReduce.

Extracting Frequent Connected Subgraphs from Large Graph Sets.

Large Graph Sampling Algorithm for Frequent Subgraph Mining

Combination of in-memory graph computation with mapreduce: a subgraph-centric method of pagerank

Towards Efficient Subgraph Search In Cloud Computing Environments

Distributed structural clustering on large graph

Exploring Computation Locality of Graph Mining Algorithms on MapReduce

JPMiner: Mining Frequent Jump Patterns from Graph Databases.

MIRAGE: An Iterative MapReduce based FrequentSubgraph Mining Algorithm

Distributed data management using MapReduce

Efficient Mining of Frequent Subgraphs with Two-Vertex Exploration

Parallelized Similarity Flooding Algorithm for Processing Large Scale Graph Datasets with MapReduce

Enumerating Maximal Bicliques from a Large Graph using MapReduce

Automatic Parallelization of Graph Queries with MapReduce

HE-Gaston algorithm for frequent subgraph mining with hadoop framework

Efficient Query Evaluation on Distributed Graphs with Hadoop Environment