Memory-Efficient Search Trees for Database Management Systems

Huanchen Zhang
DOI: https://doi.org/10.1145/3448016.3461470
2021-01-01
Abstract:ABSTRACTThe growing cost gap between DRAM and storage together with increasing database sizes means that database management systems (DBMSs) now operate with a lower memory to storage size ratio than before. On the other hand, modern DBMSs rely on in-memory search trees (e.g., indexes and filters) to achieve high throughput and low latency. These search trees, however, consume a large portion of the total memory available to the DBMS. Existing compression techniques for search trees often rely on general-purpose block compression algorithms such as Snappy and LZ4. These algorithms, however, impose too much computational overhead for in-memory search trees because the DBMS is unable to operate directly on the index data without having to decompress it first. Simply getting rid of all or part of the search trees is also suboptimal because they are crucial to query performance. This dissertation seeks to address the challenge of building compact yet fast in-memory search trees to allow more efficient use of memory in data processing systems. We first present techniques to obtain maximum compression on fast static (i.e., read-optimized) search trees. We identified sources of memory waste in existing trees and designed new succinct data structures to show that we can push the memory consumption of a search tree to the theoretical limit without compromising its query performance. Next, we introduce the hybrid index architecture as a way to efficiently modifying the aforementioned static data structures with bounded and amortized cost in performance and space. Finally, instead of structural compression, we approach the problem from an orthogonal direction by compressing the actual keys. We built a fast string compressor that can encode arbitrary input keys while preserving their order so that search trees can serve range queries directly based on compressed keys. Together, these three pieces form a practical recipe for achieving memory-efficiency in search trees and in DBMSs.
What problem does this paper attempt to address?