Efficient Keyword Search on Uncertain Graph Data
Ye Yuan,Guoren Wang,Lei Chen,Haixun Wang
DOI: https://doi.org/10.1109/tkde.2012.222
IF: 9.235
2013-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:As a popular search mechanism, keyword search has been applied to retrieve useful data in documents, texts, graphs, and even relational databases. However, so far, there is no work on keyword search over uncertain graph data even though the uncertain graphs have been widely used in many real applications, such as modeling road networks, influential detection in social networks, and data analysis on PPI networks. Therefore, in this paper, we study the problem of top-k keyword search over uncertain graph data. Following the similar answer definition for keyword search over deterministic graphs, we consider a subtree in the uncertain graph as an answer to a keyword query if 1) it contains all the keywords; 2) it has a high score (defined by users or applications) based on keyword matching; and 3) it has low uncertainty. Keyword search over deterministic graphs is already a hard problem as stated in [1], [2], [3]. Due to the existence of uncertainty, keyword search over uncertain graphs is much harder. Therefore, to improve the search efficiency, we employ a filtering-and-verification strategy based on a probabilistic keyword index, PKIndex. For each keyword, we offline compute path-based top-k probabilities, and attach these values to PKIndex in an optimal, compressed way. In the filtering phase, we perform existence, path-based and tree-based probabilistic pruning phases, which filter out most false subtrees. In the verification, we propose a sampling algorithm to verify the candidates. Extensive experimental results demonstrate the effectiveness of the proposed algorithms.