Finding Top-K Min-Cost Connected Trees In Databases
Bolin Ding,Jeffrey Xu Yu,Shan Wang,Lu Qin,Xiao Zhang,Xuemin Lin
DOI: https://doi.org/10.1109/ICDE.2007.367929
2007-01-01
Abstract:It is widely realized that the integration of database and information retrieval techniques will provide users with a wide range of high quality services. In this paper we study processing an l-keyword query, p(1), p(2),...,p(l), against a relational database which can be modeled as a weighted graph, G(V, E). Here V is a set of nodes (tuples) and E is a set of edges representing foreign key references between tuples. Let V-i subset of V be a set of nodes that contain the keyword p(i). We study finding top-k minimum cost connected trees that contain at least one node in every subset V-i, and denote our problem as GST-k- When k = 1, it is known as a minimum cost group Steiner tree problem which is NP-Complete. We observe that the number of keywords, l, is small, and propose a novel parameterized solution, with l as a parameter to find the optimal GST-1, in time complexity O(3(l)n + 2(l)((l + log n)n + m)), where n and m are the numbers of nodes and edges in graph G. Our solution can handle graphs with a large number of nodes. Our GST-1 solution can be easily extended to support GST-k, which outperforms the existing GST-k solutions over both weighted undirected/directed graphs. We conducted extensive experimental studies, and report our finding.