Abstract:We investigate the problem of summarizing frequent subgraphs by a smaller set of representative patterns. We show that some special graph patterns, called δ-jump patterns in this paper, must be representative patterns. Based on the fact, we devise two algorithms, RP-FP and RP-GD, to mine a representative set that summarizes frequent subgraphs. RP-FP derives a representative set from frequent closed subgraphs, whereas RP-GD mines a representative set from graph databases directly. Three novel heuristic strategies, Last-Succeed-First-Check, Reverse-Path-Trace, and Nephew-Representative-Based-Cover, are proposed to further improve the efficiency of RP-GD. RP-FP can provide a tight ratio bound but has heavy computation cost. RP-GD cannot provide a ratio bound guarantee but is more efficient than RP-FP. We also make use of the similarity between sibling branches in the graph pattern space to devise another much more efficient algorithm, RP-Leap, for mining a representative set that can approximately summarize frequent subgraphs. Our extensive experiments on both real and synthetic data sets verify the summarization quality and efficiency of our algorithms. To further demonstrate the interestingness of representative patterns, we study an application of representative patterns to classification. We demonstrate that the classification accuracy achieved by representative pattern-based model is no less than that achieved by closed graph pattern-based model.

Finding representative set from massive data

Finding an λ-representative subset from massive data

Continuously Extracting High-Quality Representative Set from Massive Data Streams.

How “small” Reflects “large”?—representative Information Measurement and Extraction

Efficient Algorithms for Summarizing Graph Patterns

A heuristic approach for λ-representative information retrieval from large-scale data

Continuously identifying representatives out of massive streams

Representative Selection Based on Sparse Modeling.

Sampling Representative Users From Large Social Networks

Finding Representative and Diverse Vertices within Graphs

Towards a Compact and Effective Representation for Datasets with Inhomogeneous Clusters.

Extracting representative information to enhance flexible data queries.

A Unified Framework for Representation-Based Subspace Clustering of Out-of-Sample and Large-Scale Data.

Parallel DEA-Dantzig-Wolfe Algorithm for Massive Data Applications.

Mining Representative Subspace Clusters in High-dimensional Data.

A Sampling-Based Density Peaks Clustering Algorithm for Large-Scale Data

Study of Seamless Organization and Storage Structure for Massive Spatio-Temporal Data

A Dataset Representativeness Metric and A Slicing Sampling Strategy for the Kennard-Stone Algorithm

Detecting Associations in Large Dataset on MapReduce

Distributed Statistical Inference for Massive Data

Efficient Protocols for Collecting Histograms in Large-Scale RFID Systems