Bloom Filter with Noisy Coding Framework for Multi-Set Membership Testing
Haipeng Dai,Jun Yu,Meng Li,Wei Wang,Alex X. Liu,Jinghao Ma,Lianyong Qi,Guihai Chen
DOI: https://doi.org/10.1109/tkde.2022.3199646
IF: 9.235
2022-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:This article is on designing a compact data structure for multi-set membership testing that allows fast set querying. Multi-set membership testing is a fundamental operation for computing systems. Most existing schemes for multi-set membership testing are built upon Bloom filter and fall short in either storage space cost or query speed. To address this issue, we propose Noisy Bloom Filter (NBF), Error Corrected Noisy Bloom Filter (NBF-E), and Data-driven Noisy Bloom Filter (NBF-D) in this paper. We optimize their misclassification and false positive rates by theoretical analysis and present criteria for selection between NBF, NBF-E, and NBF-D. The key novelty of the three schemes is to store set ID information in a compact but noisy way that allows fast recording and querying and use a denoising method for querying. Especially, NBF-E incorporates asymmetric error-correcting coding techniques into NBF, and NBF-D encodes set ID basedt membership testing. on their cardinality. To evaluate NBF, NBF-E, and NBF-D in comparison with the prior art, we conducted experiments using real-world network traces. The results show that NBF, NBF-E, and NBF-D significantly advance the state-of-the-art on multi-se