Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream

XM Lin,HJ Lu,J Xu,JX Yu
DOI: https://doi.org/10.1109/icde.2004.1320011
2004-01-01
Abstract:Statistics over the most recently observed data elements are often required in applications involving data streams such as intrusion detection in network monitoring, stock price prediction in financial markets, web log mining for access prediction, and user click stream mining for personalization. Among various statistics, computing quantile summary is probably most challenging because of its complexity. In this paper we study the problem of continuously maintaining quantile summary of the most recently observed N elements over a stream so that quantile queries can be answered with a guaranteed precision of epsilonN. We developed a space efficient algorithm for pre-defined N that requires only one scan of the input data stream and O(log(epsilon(2)N)/epsilon + 1/epsilon(2)) space in the worst cases. We also developed an algorithm that maintains quantile summaries for most recent N elements so that quantile queries on any most recent n elements (n less than or equal to N) can be answered with a guaranteed precision of En. The worst case space requirement for this algorithm is only O(log(2)(epsilonN)/epsilon(2)). Our performance study indicated that not only the actual quantile estimation error is far below the guaranteed precision but the space requirement is also much less than the given theoretical bound.
What problem does this paper attempt to address?