GRF: A Global Range Filter for LSM-Trees with Shape Encoding

Hengrui Wang,Te Guo,Junzhao Yang,Huanchen Zhang
DOI: https://doi.org/10.1145/3654944
2024-01-01
Abstract:Log-structured merge-trees (LSM-trees) are widely used in key-value stores because of its excellent write performance. To reduce LSM-tree's read amplification due to overlapping sorted runs, each file (i.e., SSTable) in an LSM-tree is typically associated with a point or range filter to reduce unnecessary I/Os to the runs that do not contain the target key (range). However, as modern SSDs get faster, probing multiple in-memory filters per query often makes the system CPU bottlenecked, thus compromising the system's throughput. In this paper, we developed the Global Range Filter (GRF) for RocksDB that reduces the number of filter probes per query to one. We follow the pioneering Chucky's approach by storing the sorted run IDs within the filter. However, we identify two practical challenges in building a global range filter: correctness in multi-version concurrency control and efficiency in frequent updates. We solve both challenges by the novel Shape Encoding algorithm. With further optimizations, GRF achieves a dominating performance over the state-of-the-art filters under different workloads when integrated into RocksDB.
What problem does this paper attempt to address?