Sliding-Window Probabilistic Threshold Aggregate Queries on Uncertain Data Streams

Donghui Chen,Ling Chen
DOI: https://doi.org/10.1016/j.ins.2020.02.029
IF: 8.1
2020-01-01
Information Sciences
Abstract:Uncertain data streams are ubiquitous in many sensing and networking environments. Probabilistic aggregate query that returns a probability distribution to denote possible answers is extensively used on such streams. In many monitoring applications, it is only necessary to know whether the result distribution exceeds user-defined thresholds. In this paper, we formalize two important query types: sliding-window probabilistic threshold sum query and sliding-window probabilistic threshold count query, which introduce two thresholds (probability and score) into the probability distribution. An intuition solution is to use existing probabilistic aggregate algorithms to obtain the probability distribution and then apply the thresholds to this probability distribution. However, this solution separates the threshold processing from query processing and results in low efficiency. Our work uses Gaussian mixture models to represent uncertain data. Based on the Gaussian properties and probability theory of this model, we design efficient algorithms to answer these queries, which include filtering strategies and exact calculations . Several techniques (e.g., characteristic function, incremental calculation, pruning strategy, and state transition equation) are integrated into the exact calculations to improve time and space efficiency. Experiments on real and synthetic datasets demonstrate that our algorithms outperform existing algorithms.
What problem does this paper attempt to address?