Optimizing Quality for Probabilistic Skyline Computation and Probabilistic Similarity Search

Xiaoye Miao,Yunjun Gao,Linlin Zhou,Wei Wang,Qing Li
DOI: https://doi.org/10.1109/tkde.2018.2805824
2019-01-01
Abstract:Probabilistic queries usually suffer from the noisy query result sets, due to data uncertainty. In this paper, we propose an efficient optimization framework, termed as QueryClean, for both probabilistic skyline computation and probabilistic similarity search. Its goal is to optimize query quality by selecting a group of uncertain objects to clean under limited resource available, where an entropy based quality function is leveraged. We develop an efficient index to organize the possible result sets of probabilistic queries, which is able to help avoid multiple probabilistic query evaluations over a large number of possible worlds for quality computation. Moreover, using two newly presented heuristics, we present exact and approximate algorithms for the optimization problem. Extensive experiments on both real and synthetic data sets demonstrate the efficiency and scalability of QueryClean.
What problem does this paper attempt to address?