HyperCalm Sketch: One-Pass Mining Periodic Batches in Data Streams.
Zirui Liu,Chaozhe Kong,Kaicheng Yang,Tong Yang,Ruijie Miao,Qizhi Chen,Yikai Zhao,Yaofeng Tu,Bin Cui
DOI: https://doi.org/10.1109/icde55515.2023.00009
2023-01-01
Abstract:Batch is an important pattern in data streams, which refers to a group of identical items that arrive closely. We find that some special batches that arrive periodically are of great value. In this paper, we formally define a new pattern, namely periodic batches. A group of periodic batches refers to several batches of the same item, where these batches arrive periodically. Studying periodic batches is important in many applications, such as caches, financial markets, online advertisements, networks, etc. We propose a one-pass sketching algorithm, namely the HyperCalm sketch, which takes two phases to detect periodic batches in real time. In phase 1, we propose a time-aware Bloom filter, namely HyperBloomFilter (HyperBF), to detect the start of batches. In phase 2, we propose an enhanced top-k algorithm, called Calm Space-Saving (CalmSS), to report top-k periodic batches. We theoretically derive the error bounds for HyperBF and CalmSS. Extensive experiments show HyperCalm outperforms the strawman solutions 4× in term of average relative error and 13.2× in term of speed. We also apply HyperCalm to a cache system and integrate HyperCalm into Apache Flink. All related codes are open-sourced.