Thresholded Monitoring in Distributed Data Streams
Meng Li,Haipeng Dai,Xiaoyu Wang,Rui Xia,Alex X. Liu,Guihai Chen
DOI: https://doi.org/10.1109/tnet.2020.2979654
2020-01-01
IEEE/ACM Transactions on Networking
Abstract:In this paper, we consider the problem of thresholded monitoring in distributed data streams, that is, given multiple distributed data streams observed by multiple monitors during a certain period, finding the items whose global frequencies overall data streams exceeding a given threshold. We first derive a lower bound of communication overhead for any deterministic algorithm for this problem. Then, we propose two different schemas, i.e., Low-threshold Cascaded Cuckoo Filter (L-CCF) for low-threshold monitoring and High-threshold Cascaded Cuckoo Filter (H-CCF) for high-threshold monitoring. L-CCF and H-CCF can identify items whose frequency are more than the given threshold while a desired false negative rate (FNR) is achieved and communication overhead is optimized. The key idea is to compress the communication overhead caused by transferring the ID and frequency information at the same time. First, to reduce the communication overhead of transferring IDs, we propose to encode the IDs into separate tiny parts and store these tiny parts in L-CCF or H-CCF. Second, to reduce the communication overhead of transferring frequencies, we adopt carry-in counter technique in L-CCF and multiple sampling technique in H-CCF. We evaluated L-CCF and H-CCF on two real-world traces and compared their performance with two prior adapted algorithms. Our experimental results show that on average, L-CCF and H-CCF achieve FNRs with 55.7% and 65.56% better than that of comparison algorithms while FPRs is maintained at the level of 2.23%.