BurstSketch: Finding Bursts in Data Streams

Ruijie Miao,Zheng Zhong,Jiarui Guo,Zikun Li,Tong Yang,Bin Cui
DOI: https://doi.org/10.1109/tkde.2022.3223686
IF: 9.235
2022-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Burst is a common pattern in data streams which is characterized by a sudden increase in terms of arrival rate followed by a sudden decrease. Burst detection has attracted extensive attention from the research community. To detect bursts accurately in real time, we propose a novel sketch, namely BurstSketch, which consists of two stages. Stage 1 uses the technique Running Track to select potential burst items efficiently. Stage 2 monitors the potential burst items and captures the key features of burst pattern by a technique called Snapshotting. We further propose an optimization, namely Dynamic Buckets, which can improve the accuracy of BurstSketch. We provide theoretical error bounds for Stage 1, Stage 2 and the optimized version. Experimental results show that, compared with the strawman solution, Burstsketch achieves 2.00 to 11.63 times higher F1 score, and 1.56 times higher throughput. We also integrate BurstSketch into Apache Flink, and show that using BurstSketch can be faster than simply using the built-in APIs provided by Apache Flink.
computer science, information systems, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?