On the Evolutionary of Bloom Filter False Positives - An Information Theoretical Approach to Optimizing Bloom Filter Parameters
Zhuochen Fan,Gang Wen,Zhipeng Huang,Yang Zhou,Qiaobin Fu,Tong Yang,Alex X. Liu,Bin Cui
DOI: https://doi.org/10.1109/tkde.2022.3200045
IF: 9.235
2022-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:The fundamental issue of how to calculate the false positive probability of widely used Bloom Filters (BF), from which the conventional wisdom is to derive the optimal value of $k$k, remains elusive. Since Bloom gave the false positive formula in 1970, in 2008, Bose et al. pointed out that Bloom's formula is flawed; and in 2010, Christensen et al. pointed out that Bose's formula is also flawed and gave another formula. Although Christensen's formula is perfectly accurate, it is time-consuming and impossible to calculate the optimal value of $k$k. Based on the following observation: for a BF with $m$m bits and $n$n elements, if and only if its entropy is the largest, its false positive probability is the smallest, we propose the first approach to calculating the optimal $k$k without any false positive formula. Furthermore, we propose a new and more accurate upper bound for the false positive probability. When the size of a Bloom Filter becomes infinitely large, our upper bound turns equal to the lower bound, which becomes Bloom's formula and deepens our understanding towards it. Besides, we derive the bounds of correct rate of Counting Bloom Filters (CBFs) by applying our proposed formulas about BFs to them.
computer science, information systems, artificial intelligence,engineering, electrical & electronic