A Pareto Optimal Bloom Filter Family with Hash Adaptivity
Meng Li,Rongbiao Xie,Deyi Chen,Haipeng Dai,Rong Gu,He Huang,Wanchun Dou,Guihai Chen
DOI: https://doi.org/10.1007/s00778-022-00755-z
2022-01-01
The VLDB Journal
Abstract:Bloom filter is a compact memory-efficient probabilistic data structure supporting membership testing, i.e., to check whether an element is in a given set. However, as Bloom filter maps each element with random hash functions, little flexibility is provided even if the information of negative keys (elements are not in the set) is available, especially when the misidentification of negative keys brings different costs. The problem worsens when the hash functions are non-uniform, i.e., mapping each element into Bloom filter non-uniformly. To address the above problem, we propose a new hash adaptive Bloom filter (HABF) that supports customizing hash functions for keys. Besides, we propose a filter family, including f-HABF (fast hashing version), c-HABF (cache-friendly version), and s-HABF (stacked version). We show that HABF family is Pareto optimal among all comparison filters in terms of accuracy and query latency. We conduct extensive experiments on representative datasets, and the results show that HABF family outperforms the standard Bloom filter and its cutting-edge variants on the whole in terms of accuracy, construction/query time, and memory space consumption. All the source codes are available in our source codes (https://github.com/njulands/HashAdaptiveBF).