Optimal Enumeration
Lijun Chang,Xuemin Lin,Wenjie Zhang,Jeffrey Xu Yu,Ying Zhang,Lu Qin
DOI: https://doi.org/10.14778/2735479.2735486
IF: 2.5
2015-01-01
Proceedings of the VLDB Endowment
Abstract:Driven by many real applications, graph pattern matching has attracted a great deal of attention recently. Consider that a twig-pattern matching may result in an extremely large number of matches in a graph; this may not only confuse users by providing too many results but also lead to high computational costs. In this paper, we study the problem of top- k tree pattern matching; that is, given a rooted tree T , compute its top- k matches in a directed graph G based on the twig-pattern matching semantics. We firstly present a novel and optimal enumeration paradigm based on the principle of Lawler's procedure. We show that our enumeration algorithm runs in O ( n T + log k ) time in each round where n T is the number of nodes in T. Considering that the time complexity to output a match of T is O ( n T ) and n T ≥ log k in practice, our enumeration technique is optimal. Moreover, the cost of generating top-1 match of T in our algorithm is O ( m R ) where m R is the number of edges in the transitive closure of a data graph G involving all relevant nodes to T. O ( m R ) is also optimal in the worst case without pre-knowledge of G. Consequently, our algorithm is optimal with the running time O ( m R + k ( n T + log k )) in contrast to the time complexity O ( m R log k + kn T (log k + d T )) of the existing technique where d T is the maximal node degree in T. Secondly, a novel priority based access technique is proposed, which greatly reduces the number of edges accessed and results in a significant performance improvement. Finally, we apply our techniques to the general form of top- k graph pattern matching problem (i.e., query is a graph) to improve the existing techniques. Comprehensive empirical studies demonstrate that our techniques may improve the existing techniques by orders of magnitude.