Top-K Aggregate Queries on Continuous Probabilistic Datasets

Jianwen Chen,Ling Feng,Jun Zhang
DOI: https://doi.org/10.1007/978-3-642-38562-9_22
2013-01-01
Abstract:Top-K aggregate query, which ranks groups of tuples by their aggregate values and returns the K groups with the highest aggregates, is a crucial requirement in many domains such as information extraction, data integration, and sensor data processing. In this paper, we formulate the top-K aggregate queries when the tuple scores are presented as continuous probability distributions. Algorithms for top-K aggregate queries are presented. To further improve the performance, we develop pruning techniques and adaptive strategy that avoid computing the exact aggregate values of some groups that are guaranteed not to be in top-K. Our experimental study shows the efficiency of our techniques over several datasets with continuous attribute uncertainty.
What problem does this paper attempt to address?