Abstract:For many practical applications, it is a fundamental problem to estimate the flow cardinalities over big network data consisting of numerous flows (especially a large quantity of mouse flows mixed with a small number of elephant flows, whose cardinalities follow a power-law distribution). Traditionally the research on this problem focused on using a small amount of memory to estimate each flow’s cardinality from a large range (up to 10). However, although the memory needed for each individual flow has been greatly compressed, when there is an extremely large number of flows, the overall memory demand can still be very high, exceeding the availability under some important scenarios, such as implementing online measurement modules in network processors using only on-chip cache memory. In this paper, instead of allocating a separated data structure (called estimator) for each flow, we take a different path by viewing all the flows together as a whole: Each flow is allocated with a virtual estimator, and these virtual estimators share a common memory space. We discover that sharing at the multi-bit register level is superior than sharing at the bit level. We propose a unified framework of virtual estimators that allows us to apply Manuscript received September 1, 2016; revised May 5, 2017; accepted September 6, 2017; approved by IEEE/ACM TRANSACTIONS ON NETWORKING Editor M. Li. Date of publication October 9, 2017; date of current version December 15, 2017. The preliminary version of this paper titled “Hyper-Compact Virtual Estimators for Big Network Data Based on Register Sharing” was published in proceedings of the ACM SIGMETRICS, pp. 417–428, June 2015. This work was supported in part by the National Key Research and Development Program of China under Grant 2017YFB1003000, in part by the National Natural Science Foundation of China under Grant 61502098, Grant 61632008, and Grant 61320106007, in part by Jiangsu Provincial Natural Science Foundation of China under Grant BK20150629, in part by the National Science Foundation of United States under Grant CNS1719222 and Grant STC-1562485, in part by a Grant from the Florida Cybersecurity Center, in part by the Jiangsu Provincial Key Laboratory of Network and Information Security under Grant BM2003201, in part by the Key Laboratory of Computer Network and Information Integration of Ministry of Education of China under Grant 93K-9, and in part by the Collaborative Innovation Center of Novel Software Technology and Industrialization. (Corresponding author: Shigang Chen.) Q. Xiao, J. Luo, and T. Li are with the School of Computer Science and Engineering, Southeast University, Nanjing 210018, China (e-mail: csqjxiao@seu.edu.cn; jluo@seu.edu.cn; freya.li.tengli@gmail.com). S. Chen and Y. Zhou are with the Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL 32611 USA (e-mail: sgchen@cise.ufl.edu; youzhou@cise.ufl.edu). M. Chen was with the Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL 32611 USA. He is now with Google Inc, Mountain View, CA 94043 USA (e-mail: minchen@google.com). Y. Ling is with the Applied Research Laboratories, Telcordia Technologies, Morristown, NJ 07960 USA (e-mail: lingy@research.telcordia.com). Digital Object Identifier 10.1109/TNET.2017.2753842 the idea of sharing to an array of cardinality estimation solutions, e.g., HyperLogLog and PCSA, achieving far better memory efficiency than the best existing work. Our experiment shows that the new solution can work in a tight memory space of less than 1 bit per flow or even one tenth of a bit per flow — a quest that has never been realized before.

Estimating Cardinality for Arbitrarily Large Data Stream with Improved Memory Efficiency

Better with Fewer Bits: Improving the Performance of Cardinality Estimation of Large Data Streams

Cardinality Estimation for Elephant Flows

Cardinality Estimation for Elephant Flows: A Compact Solution Based on Virtual Register Sharing.

Couper: Memory-Efficient Cardinality Estimation under Unbalanced Distribution.

In Search of a Memory-Efficient Framework for Online Cardinality Estimation

A Generic Sketch for Estimating Super-Spreaders and Per-Flow Cardinality Distribution in High-Speed Data Streams

Fine-grained Probability Counting for Cardinality Estimation of Data Streams.

Hyper-Compact Virtual Estimators for Big Network Data Based on Register Sharing

CardSketch: Shift Attention for Network-wide Cardinality Telemetry

Simple and Efficient Cardinality Estimation in Data Streams

SuperGuardian: Superspreader Removal for Cardinality Estimation in Data Streaming

Erasable Virtual HyperLogLog for Approximating Cumulative Distribution over Data Streams

Virtual self-adaptive bitmap for online cardinality estimation

A Memory-Compact and Fast Sketch for Online Tracking Heavy Hitters in a Data Stream

Utilizing Dynamic Properties of Sharing Bits and Registers to Estimate User Cardinalities over Time

QSketch: An Efficient Sketch for Weighted Cardinality Estimation in Streams

Fine-Grained Probability Counting: Refined LogLog Algorithm

Accurate and O(1)-Time Query of Per-Flow Cardinality in High-Speed Networks

Cardinalities estimation under sliding time window by sharing HyperLogLog Counter

Cardinality Estimation Meets Good-Turing