Abstract:ABSTRACTThe growing cost gap between DRAM and storage together with increasing database sizes means that database management systems (DBMSs) now operate with a lower memory to storage size ratio than before. On the other hand, modern DBMSs rely on in-memory search trees (e.g., indexes and filters) to achieve high throughput and low latency. These search trees, however, consume a large portion of the total memory available to the DBMS. Existing compression techniques for search trees often rely on general-purpose block compression algorithms such as Snappy and LZ4. These algorithms, however, impose too much computational overhead for in-memory search trees because the DBMS is unable to operate directly on the index data without having to decompress it first. Simply getting rid of all or part of the search trees is also suboptimal because they are crucial to query performance. This dissertation seeks to address the challenge of building compact yet fast in-memory search trees to allow more efficient use of memory in data processing systems. We first present techniques to obtain maximum compression on fast static (i.e., read-optimized) search trees. We identified sources of memory waste in existing trees and designed new succinct data structures to show that we can push the memory consumption of a search tree to the theoretical limit without compromising its query performance. Next, we introduce the hybrid index architecture as a way to efficiently modifying the aforementioned static data structures with bounded and amortized cost in performance and space. Finally, instead of structural compression, we approach the problem from an orthogonal direction by compressing the actual keys. We built a fast string compressor that can encode arbitrary input keys while preserving their order so that search trees can serve range queries directly based on compressed keys. Together, these three pieces form a practical recipe for achieving memory-efficiency in search trees and in DBMSs.

Order-Preserving Key Compression for In-Memory Search Trees

Memory-Efficient Search Trees for Database Management Systems

Blitzcrank: Fast Semantic Compression for In-Memory Online Transaction Processing

Upscaledb: Efficient Integer-Key Compression in a Key-Value Store using SIMD Instructions

Enabling Efficient Random Access to Hierarchically-Compressed Data

FPGA-Accelerated Compactions for LSM-based Key-Value Store.

The HV-tree: a memory hierarchy aware version index

The HV-tree

Compressed Key Sort and Fast Index Reconstruction

CompassDB: Pioneering High-Performance Key-Value Store with Perfect Hash

ZipCache: A DRAM/SSD Cache with Built-in Transparent Compression

Compressed Indexes for Fast Search of Semantic Data

An Optimization of Key-Value Store Based on Segmented LSM-Tree

SECOMPAX: A Bitmap Index Compression Algorithm

Optimal hierarchical layouts for cache-oblivious search trees

Parallelization Optimization of KD-Tree Building Algorithm

EMS: Adaptive Evict-then-Merge Strategy for Head-wise KV Cache Compression Based on Global-Local Importance

Effectively Compress KV Heads for LLM

CompressDB: Enabling Efficient Compressed Data Direct Processing for Various Databases

Jointly Optimizing Query Encoder and Product Quantization to Improve Retrieval Performance