Bubble Sketch: A High-performance and Memory-efficient Sketch for Finding Top- K Items in Data Streams

Lu Cao,Qilong Shi,Yuxi Liu,Hanyue Zheng,Yao Xin,Wenjun Li,Tong Yang,Yangyang Wang,Yang Xu,Weizhe Zhang,Mingwei Xu
DOI: https://doi.org/10.1145/3627673.3679882
2024-01-01
Abstract:Sketch algorithms are crucial for identifying top-k items in large-scale data streams. Existing methods often compromise between performance and accuracy, unable to efficiently handle increasing data volumes with limited memory. We present Bubble Sketch, a compact algorithm that excels in both performance and accuracy. Bubble Sketch achieves this by (1) Recording only full keys of hot items, significantly reducing memory usage, and (2) Using threshold relocation to resolve conflicts, enhancing detection accuracy. Unlike traditional methods, Bubble Sketch eliminates the need for a Min-Heap, ensuring fast processing speeds. Experiments show Bubble Sketch outperforms the other seven algorithms compared, with the highest throughput and precision, and surpasses HeavyKeeper in accuracy by up to two orders of magnitude.
What problem does this paper attempt to address?