Modeling and Computing Probabilistic Skyline on Incomplete Data.
Kaiqi Zhang,Hong Gao,Xixian Han,Zhipeng Cai,Jianzhong Li
DOI: https://doi.org/10.1109/tkde.2019.2904967
IF: 9.235
2019-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:The skyline query is important in the database community. In recent years, the researches on incomplete data have been increasingly considered, especially for the skyline query. However, the existing skyline definition on incomplete data cannot provide users with valuable references. In this paper, we propose a novel skyline definition utilizing probabilistic model on incomplete data where each point has a probability to be in the skyline. In particular, it returns K points with the highest skyline probabilities. In addition, we propose incomplete models and estimate probability density functions of missing values on independent, correlated, and anti-correlated distributions, respectively. Meanwhile, it is a big challenge to compute probabilistic skyline on incomplete data. We propose three efficient algorithms SPISkyline, SPCSkyline, and SPASkyline for probabilistic skyline computation on incomplete data complying with independent, correlated, and anti-correlated distributions, respectively. They employ pruning strategy, optimization of the process of probability computation, and sorting technique to improve the efficiency of probabilistic skyline computation on incomplete data. Our experimental results demonstrate that our proposed concept of probabilistic skyline is an effective method to tackle skyline query on incomplete data and our algorithms are tens of times faster than the naive algorithm on both synthetic and real datasets.