Diamond Sketch: Accurate Per-flow Measurement for Big Streaming Data
Tong Yang,Siang Gao,Zhouyi Sun,Yufei Wang,Yulong Shen,Xiaoming Li
DOI: https://doi.org/10.1109/tpds.2019.2923772
IF: 5.3
2019-01-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:Per-flow measurement is a critical issue in computer networks, and one of its key tasks is to count the number of packets in each flow (for big streaming data). The literature has demonstrated that sketch is the most memory-efficient data structure for the counting task, and is widely used in distributed systems. Existing sketches often use many counters that are of the same size to record the number of packets in a flow, thus the counters are forced to be large enough to accommodate the size of the largest flow. Unfortunately, as most flows are small (i.e., mice flows) and only a very few flows are large (i.e., elephant flows), many counters represent very small values, which is a waste of memory. Sketches are often stored in fast but expensive memory (e.g., SRAM), thus it is critical to achieve high memory efficiency. To address this issue, we propose a novel sketch, namely the Diamond sketch. The Diamond sketch is composed of atom sketches, and each atom sketch uses small counters. The key idea of Diamond is to dynamically assign an appropriate number of atom sketches to each flow on demand, thus optimizing memory efficiency. Experimental results show that the Diamond sketch outperforms the best of the five typical sketches by up to 508.3 times in terms of relative error while keeping comparable speed. We made the source code of all the six sketches available on GitHub [1] .