Probabilistic Skyline on Incomplete Data

Kaiqi Zhang,Hong Gao,Xixian Han,Zhipeng Cai,Jianzhong Li
DOI: https://doi.org/10.1145/3132847.3132930
2017-01-01
Abstract:The skyline query is important in database community. In recent years, the researches on incomplete data have been increasingly considered, especially for the skyline query. However, the existing skyline definition on incomplete data cannot provide users with valuable references. In this paper, we propose a novel skyline definition utilizing probabilistic model on incomplete data where each point has a probability to be in the skyline. In particular, it returnsK points with the highest skyline probabilities. Meanwhile, it is a big challenge to compute probabilistic skyline on incomplete data. We propose an efficient algorithm PISkyline, which utilizes two pruning strategies to reduce the number of points and adopts two optimizations to accelerate probability computation for each point. Nevertheless, PISkyline is susceptible to the order of input data and there is still a great deal of room for optimization. We develop a point-level sorting technique by adjusting the order of accessing points to further improve the efficiency of PISkyline. Our experimental results demonstrate that our algorithms are tens of times faster than the naive algorithm on both synthetic and real datasets.
What problem does this paper attempt to address?