FID-sketch: an Accurate Sketch to Store Frequencies in Data Streams

Tong Yang,Haowei Zhang,Hao Wang,Muhammad Shahzad,Xue Liu,Qin Xin,Xiaoming Li
DOI: https://doi.org/10.1007/s11280-018-0546-5
2018-01-01
World Wide Web
Abstract:Sketches are being extensively used in a large number of real world applications to estimate frequencies of data items. Due to the unprecedented increase in the amount of Internet data and a relatively slower increase in the size of on-chip memories, existing sketches are becoming increasingly unable to keep the accuracy of the frequency estimates at an acceptable level. In this paper, we design a new sketch, called FID-sketch, that has a significantly higher accuracy and a much smaller on-chip memory footprint compared to the existing sketches. The key intuition behind the design of the FID-sketch is that before inserting an item, unlike prior sketches, it first estimates the current value of the frequency of that item stored in the sketch, and then increments as few counters as possible instead of incrementing a pre-determined fixed number of counters. We carried out extensive experiments to evaluate and compare the performance of FID-sketch with existing sketches on multi-core CPU and GPU platforms. Our experimental results show that our FID-sketch significantly outperforms the state-of-the-art with 36.7 times lower relative error. We have released the source code of our proposed sketch and other related sketches that we implemented at Github [21].
What problem does this paper attempt to address?