Supergraph Search in Graph Databases Via Hierarchical Feature-Tree
Bingqing Lyu,Lu Qin,Xuemin Lin,Lijun Chang,Jeffrey Xu Yu
DOI: https://doi.org/10.1109/tkde.2018.2833124
IF: 9.235
2019-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Supergraph search is a fundamental problem in graph databases that is widely applied in many application scenarios. Given a graph database and a query-graph, supergraph search retrieves all data-graphs contained in the query-graph from the graph database. Most existing solutions for supergraph search follow the pruning-and-verification framework, which prune false answers based on features in the pruning phase and perform subgraph isomorphism testings on the remaining graphs in the verification phase. However, they are not scalable to handle large-sized data-graphs and query-graphs due to three drawbacks. First, they rely on a frequent subgraph mining algorithm to select features which is expensive and cannot generate large features. Second, they require a costly verification phase. Third, they process features in a fixed order without considering their relationships to the query-graph. In this paper, we address the three drawbacks and propose new indexing and query processing algorithms. In indexing, we select features directly from the data-graphs without expensive frequent subgraph mining. The features form a feature-tree that contains all-sized features and both the cost sharing and pruning power of the features are considered. In query processing, we propose a new algorithm, where the order to process features is query-dependent by considering both the cost sharing and the pruning power. We explore two optimization strategies to further improve the algorithm efficiency. The first strategy applies a lightweight graph compression technique and the second strategy optimizes the inclusion of answers. We further introduce how to efficiently maintain the index incrementally when the graph database is updated dynamically. Moreover, we propose an approximation approach to significantly reduce the computational cost for large data-graphs and/or query-graphs while preserving a high result quality. Finally, we conduct extensive performance studies on two real large datasets to demonstrate the efficiency and effectiveness of our algorithms.