Efficient Algorithms for Summarizing Graph Patterns

Jianzhong Li,Yong Liu,Hong Gao
DOI: https://doi.org/10.1109/tkde.2010.249
IF: 9.235
2010-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:We investigate the problem of summarizing frequent subgraphs by a smaller set of representative patterns. We show that some special graph patterns, called δ-jump patterns in this paper, must be representative patterns. Based on the fact, we devise two algorithms, RP-FP and RP-GD, to mine a representative set that summarizes frequent subgraphs. RP-FP derives a representative set from frequent closed subgraphs, whereas RP-GD mines a representative set from graph databases directly. Three novel heuristic strategies, Last-Succeed-First-Check, Reverse-Path-Trace, and Nephew-Representative-Based-Cover, are proposed to further improve the efficiency of RP-GD. RP-FP can provide a tight ratio bound but has heavy computation cost. RP-GD cannot provide a ratio bound guarantee but is more efficient than RP-FP. We also make use of the similarity between sibling branches in the graph pattern space to devise another much more efficient algorithm, RP-Leap, for mining a representative set that can approximately summarize frequent subgraphs. Our extensive experiments on both real and synthetic data sets verify the summarization quality and efficiency of our algorithms. To further demonstrate the interestingness of representative patterns, we study an application of representative patterns to classification. We demonstrate that the classification accuracy achieved by representative pattern-based model is no less than that achieved by closed graph pattern-based model.
What problem does this paper attempt to address?