Finding Recently Persistent Flows in High-Speed Packet Streams Based on Cuckoo Filter

Qingjun Xiao,Yifei Li,Yeke Wu
DOI: https://doi.org/10.1016/j.comnet.2023.110097
IF: 5.493
2023-01-01
Computer Networks
Abstract:In high-speed networks, flow-level traffic measurement is an essential tool to understand how network bandwidth is consumed and support the detection of anomalous traffic. While many prior work focuses on tracking the frequent flows (a.k.a. heavy hitters), in this paper, we put more focus on tracking the persistent flows, which jointly consider the frequency, duration and regularity of the packet arrival events of a flow. Although this more generalized metric called persistence has been defined before, it is still unknown how to use limited memory on data plane to monitor the top-k persistent flows in a recent time interval. In this paper, we propose an algorithm named PFD-DW built upon cuckoo filter (an improved version of hash table with better memory efficiency), to monitor the top-k persistent flows under the time-decaying window. It can be regarded as a variant of cuckoo filter, which transforms each bucket into a bucket-level min-heap. Its advantage is that, when the table is full and a packet of a new flow arrives, it can select the least persistent flow along the cuckoo kicking path as the victim of flow replacement. We deliberately avoid the scanning of the entire table to keep the high time efficiency. Based on real-world network traffic traces, we evaluate the performance of our DAKP-CF, a degraded version of PFD-DW that only considers each flow’s packet arrival frequency. The results show that it outperforms the existing algorithms for the top-k frequent flow identification task. It provides nearly 98% identification accuracy with roughly 50% less memory cost. We also evaluate our PFD-DW algorithm for the more generalized task of identifying the recently persistent flows. It provides 93% identification rate of top-3000 persistent flows using only 576KB memory, while attaining 1.5% persistence estimation error. Its memory cost is reduced by 97% than another proposed solution PFD-SW under sliding time window. We have also developed a prototype of PFD-DW, based on the Fd.io VPP software switch accelerated by Intel DPDK.
What problem does this paper attempt to address?