Approximate Counting with a Floating-Point Counter

Miklós Csűrös
DOI: https://doi.org/10.1007/978-3-642-14031-0_39
2010-01-01
Abstract:When many objects are counted simultaneously in large data streams, as in the course of network traffic monitoring, or Webgraph and molecular sequence analyses, memory becomes a limiting factor. Robert Morris [Communications of the ACM, 21:840–842, 1978] proposed a probabilistic technique for approximate counting that is extremely economical. The basic idea is to increment a counter containing the value X with probability 2− X. As a result, the counter contains an approximation of documentclass[12pt]{minimal}usepackage{amsmath}usepackage{wasysym}usepackage{amsfonts}usepackage{amssymb}usepackage{amsbsy}usepackage{mathrsfs}usepackage{upgreek}setlength{oddsidemargin}{-69pt}egin{document}$lg n$end{document} after n probabilistic updates, stored in documentclass[12pt]{minimal}usepackage{amsmath}usepackage{wasysym}usepackage{amsfonts}usepackage{amssymb}usepackage{amsbsy}usepackage{mathrsfs}usepackage{upgreek}setlength{oddsidemargin}{-69pt}egin{document}$lglg n$end{document} bits. Here we revisit the original idea of Morris. We introduce a binary floating-point counter that combines a d-bit significand with a binary exponent, stored together on documentclass[12pt]{minimal}usepackage{amsmath}usepackage{wasysym}usepackage{amsfonts}usepackage{amssymb}usepackage{amsbsy}usepackage{mathrsfs}usepackage{upgreek}setlength{oddsidemargin}{-69pt}egin{document}$d+lglg n$end{document} bits. The counter yields a simple formula for an unbiased estimation of n with a standard deviation of about 0.6·n2− d/2.We analyze the floating-point counter’s performance in a general framework that applies to any probabilistic counter. In that framework, we provide practical formulas to construct unbiased estimates, and to assess the asymptotic accuracy of any counter.
What problem does this paper attempt to address?