HeavyCache: A Generic Sketch for Summarizing Data Streams

Yikai Zhao,Yinda Zhang,Jie Jiang,Peng Liu,Yuhan Wu,Tong Yang
DOI: https://doi.org/10.1109/monetec60984.2024.10768126
2024-01-01
Abstract:Nowadays, massive data appears in the form of high-speed data streams. It is an important and challenging problem to perform various mining tasks on data streams, such as finding heavy hitters, estimating frequencies, and etc. Current applications often need to handle several tasks at the same time. Traditional sketch solutions rely on different data structures and algorithms for different tasks. As a result, multiple data structures are needed in practice. In this paper, we propose a generic sketch algorithm, namely HeavyCache, which can quickly record each item and perform a broad spectrum of data mining tasks. The key idea is to leverage cache mechanism to separate heavy items from light items. Specifically, HeavyCache accurately records the detailed information of items, while simply approximately records the frequencies of light items. We show how HeavyCache is used to process the six typical tasks. Experimental results show that our HeavyCache significantly outperforms state-of-the-art solutions in terms of both accuracy and speed for each of the six tasks.
What problem does this paper attempt to address?