SF-sketch: A Fast, Accurate, and Memory Efficient Data Structure to Store Frequencies of Data Items

Tong Yang,Lingtong Liu,Yibo Yan,Muhammad Shahzad,Yulong Shen,Xiaoming Li,Bin Cui,Gaogang Xie
DOI: https://doi.org/10.1109/ICDE.2017.50
2017-01-01
Abstract:A sketch is a probabilistic data structure that is used to record frequencies of items in a multi-set. Sketches have been applied in a variety of fields, such as data stream processing, natural language processing, distributed data sets etc. In this paper, we propose a new sketch, called Slim-Fat (SF) sketch, which has a much smaller memory footprint for query while supporting updates. The key idea behind our proposed SF-sketch is to maintain two separate sketches: a small sketch called Slimsubsketch and a large sketch called Fat-subsketch. The Slimsubsketch enables fast and accurate querying. The Fat-subsketch is used to assist the insertion and deletion from Slim-subsketch. We implemented and evaluated SF-sketch along with several prior sketches and compared them side by side. Our experimental results show that SF-sketch significantly outperforms the most commonly used CM-sketch in terms of accuracy. The full version is provided at arXiv.org [12].
What problem does this paper attempt to address?