Enabling Space-Time Efficient Range Queries with REncoder

Zhuochen Fan,Bowen Ye,Ziwei Wang,Zheng Zhong,Jiarui Guo,Yuhan Wu,Haoyu Li,Tong Yang,Yaofeng Tu,Zirui Liu,Bin Cui
DOI: https://doi.org/10.1007/s00778-024-00873-w
2024-01-01
Abstract:A range filter is a data structure to answer range membership queries. Range queries are common in modern applications, and range filters have gained rising attention for improving the performance of range queries by ruling out empty range queries. However, state-of-the-art range filters, such as SuRF and Rosetta, suffer either high false positive rate or low throughput. In this paper, we propose a novel range filter, called REncoder. It organizes all prefixes of keys into a segment tree, and locally encodes the segment tree into a Bloom filter to accelerate queries. REncoder supports diverse workloads by adaptively choosing how many levels of the segment tree to store. In addition, we also propose a customized blacklist optimization for it to further improve the accuracy of multi-round queries. We theoretically prove that the error of REncoder is bounded and derive the asymptotic space complexity under the bounded error. We conduct extensive experiments on both synthetic datasets and real datasets. The experimental results show that REncoder outperforms all state-of-the-art range filters, and the proposed blacklist optimization can effectively further reduce the false positive rate.
What problem does this paper attempt to address?