Abstract:A filter is a widely used data structure for storing an approximation of a given set $S$ of elements from some universe $U$ (a countable set).It represents a superset $S'\supseteq S$ that is ''close to $S$'' in the sense that for $x\not\in S$, the probability that $x\in S'$ is bounded by some $\varepsilon > 0$. The advantage of using a Bloom filter, when some false positives are acceptable, is that the space usage becomes smaller than what is required to store $S$ exactly. Though filters are well-understood from a worst-case perspective, it is clear that state-of-the-art constructions may not be close to optimal for particular distributions of data and queries. Suppose, for instance, that some elements are in $S$ with probability close to 1. Then it would make sense to always include them in $S'$, saving space by not having to represent these elements in the filter. Questions like this have been raised in the context of Weighted Bloom filters (Bruck, Gao and Jiang, ISIT 2006) and Bloom filter implementations that make use of access to learned components (Vaidya, Knorr, Mitzenmacher, and Krask, ICLR 2021). In this paper, we present a lower bound for the expected space that such a filter requires. We also show that the lower bound is asymptotically tight by exhibiting a filter construction that executes queries and insertions in worst-case constant time, and has a false positive rate at most $\varepsilon $ with high probability over input sets drawn from a product distribution. We also present a Bloom filter alternative, which we call the $\textit{Daisy Bloom filter}$, that executes operations faster and uses significantly less space than the standard Bloom filter.

Stable Learned Bloom Filters for Data Streams

Two-layer partitioned and deletable deep bloom filter for large-scale membership query

PA-LBF: Prefix-Based and Adaptive Learned Bloom Filter for Spatial Data

A Critical Analysis of Classifier Selection in Learned Bloom Filters

Optimizing Bloom Filter: Challenges, Solutions, and Comparisons

A Model for Learned Bloom Filters and Related Structures

Split Bloom Filter

A Model for Learned Bloom Filters, and Optimizing by Sandwiching

Low Computational Cost Bloom Filters

Bloom Filter with Noisy Coding Framework for Multi-Set Membership Testing

Shed More Light on Bloom Filter's Variants

Bloofi: Multidimensional Bloom filters

scaleBF: A High Scalable Membership Filter using 3D Bloom Filter

Difference Bloom Filter: a Probabilistic Structure for Multi-set Membership Query

Noisy Bloom Filters for Multi-Set Membership Testing

Distance Sensitive Bloom Filters Without False Negatives

A Shifting Bloom Filter Framework for Set Queries.

RobustBF: A High Accuracy and Memory Efficient 2D Bloom Filter

Cardinality computing: a new step towards fully representing multi-sets by bloom filters

False Negative Problem of Counting Bloom Filter

Daisy Bloom Filters