PeriodicSketch: Finding Periodic Items in Data Streams

Zhuochen Fan,Yinda Zhang,Tong Yang,Mingyi Yan,Gang Wen,Yuhan Wu,Hongze Li,Bin Cui
DOI: https://doi.org/10.1109/icde53745.2022.00012
2022-01-01
Abstract:In this paper, we study periodic items in data streams, which refer to those items arriving with a fixed interval. All existing works involving mining periodic patterns does not fit for data stream scenarios. To find periodic items in real time, we propose a novel sketch, PeriodicSketch, aiming to accurately record top-$K$ periodic items. To the best of our knowledge, this is the first work to find periodic items in data streams. Any interval may occur many times, and we use frequency to denote the number of an interval occurred. To pick out periodic items with high frequency, we propose a key technique called Guaranteed Soft Uniform (GSU) replacement strategy. Our theoretical proofs show that when replacement is successful, it is more likely that the new item has a higher frequency than the current smallest frequency; and GSU can ensure that our items in the sketch will approach the true periodic items closer and closer. And as soon as we get all the periodic items, the state would not change worse with high probability. We conduct extensive experiments, and the experimental results show that the Average Absolute Error (AAE) of our sketch using 1/10 memory is around 737 times (up to 2019 times) lower than the baseline solution. Finally, we provide a concrete case: Cache prefetch, which proves that PeriodicSketch can significantly improve the Cache hit ratio. All related codes of PeriodicSketch are open-sourced and available at GitHub [1].
What problem does this paper attempt to address?