Fine-grained Probability Counting for Cardinality Estimation of Data Streams.

Lun Wang,Tong Yang,Hao Wang,Jie Jiang,Zekun Cai,Bin Cui,Xiaoming Li
DOI: https://doi.org/10.1007/s11280-018-0583-0
2018-01-01
World Wide Web
Abstract:Estimating the number of distinct flows, also called the cardinality , is an important issue in many network applications, such as traffic measurement, anomaly detection, etc. The challenge is that high accuracy should be achieved with line speed and small auxiliary memory. Flajolet-Martin algorithm, LogLog algorithm, and HyperLogLog algorithm form a line of work in this area with improving performance. In this paper, we propose refined versions of these algorithms to achieve higher accuracy. The key observations are (1) the “leftmost” hash functions used by these algorithms can be generalized to reach higher accuracy, (2) the amendment coefficient can be highly biased in some certain streams or datasets so dynamically setting the amendment coefficient instead of using the one derived in pure math can lead to much better accuracy. Experimental results show great improvement of accuracy and stability of the refined versions over original algorithms.
What problem does this paper attempt to address?