A Unified Framework for Mining Batch and Periodic Batch in Data Streams

Zirui Liu,Xiangyuan Wang,Yuhan Wu,Tong Yang,Kaicheng Yang,Hailin Zhang,Yaofeng Tu,Bin Cui
DOI: https://doi.org/10.1109/tkde.2024.3399024
2024-01-01
Abstract:Batch is an important pattern in data streams, which refers to a group of identical items that arrive closely. We find that some special batches that arrive periodically are of great value. In this paper, we formally define a new pattern, namely periodic batches. A group of periodic batches refers to several batches of the same item, where these batches arrive periodically. Studying periodic batches is important in many applications, such as caches, financial markets, online advertisements, networks, etc. This paper proposes a unified framework, namely the HyperCalm sketch, to detect batch and periodic batch in data streams. HyperCalm sketch takes two phases to detect periodic batches. In phase 1, we propose a time-aware Bloom filter, called HyperBloomFilter (HyperBF), to detect batches. In phase 2, we propose an enhanced top-k algorithm, called Calm Space-Saving (CalmSS), to report top-itk periodic batches. Extensive experiments show HyperCalm outperforms the strawman solutions 4× in term of average relative error and 98.1× in term of speed. All related codes are open-sourced.
What problem does this paper attempt to address?