ABC: A practicable sketch framework for non-uniform multisets

Junzhi Gong,Tong Yang,Yang Zhou,Dongsheng Yang,Shigang Chen,Bin Cui,Xiaoming Li
DOI: https://doi.org/10.1109/BigData.2017.8258193
2017-01-01
Abstract:Sketch is a data structure used to record frequencies of items in a multiset, which is widely used in data streams, data graph, distributed datasets processing, etc. It works with small memory usage and a high speed at the cost of a slight inaccuracy. In practice, frequencies of items in many datasets are non-uniformly distributed. Unfortunately, existing sketches can hardly work well on non-uniform datasets. To address this issue, we propose a new sketch framework, namely ABC framework, which can be applied to most existing sketches and can significantly improve the accuracy on non-uniform datasets. The key idea behind our framework is that when a counter overflows, it makes use of the space from the adjacent counters by operations of bits-borrowing and combination. Extensive experimental results show that our ABC framework improves the accuracy by 4.10 times and 4.49 times in average, respectively. A demo and all the related source codes are available on our homepage [1].
What problem does this paper attempt to address?