OneSketch: A Generic and Accurate Sketch for Data Streams
Zhuochen Fan,Ruixin Wang,Yalun Cai,Ruwen Zhang,Tong Yang,Yuhan Wu,Bin Cui,Steve Uhlig
DOI: https://doi.org/10.1109/tkde.2023.3278028
IF: 9.235
2023-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:In this paper, we propose a generic sketch algorithm capable of achieving more accuracy in the following five tasks: finding top-$k$ frequent items, finding heavy hitters, per-item frequency estimation, and heavy changes in the time and spatial dimension. The state-of-the-art (SOTA) sketch solution for multiple measurement tasks is ElasticSketch (ES). However, the accuracy of its frequency estimation has room for improvement. The reason for this is that ES suffers from overestimation errors in the light part, which introduces errors when querying both frequent and infrequent items. To address these problems, we propose a generic sketch, OneSketch, designed to minimize overestimation errors. To achieve the design goal, we propose four key techniques, which embrace hash collisions and minimize possible errors by handling highly recurrent item replacements well. Experimental results show that OneSketch clearly outperforms 12 SOTA schemes. For example, compared with ES, OneSketch achieves more than 10× lower Average Absolute Error on finding top-$k$ frequent items and heavy hitters, as well as 48.3% and 38.4% higher F1 Scores on two heavy changes under 200KB memory, respectively.
computer science, information systems, artificial intelligence,engineering, electrical & electronic