Revisiting Learned Index with Byte-addressable Persistent Storage

Rui Zhang,Yukai Huang,Sicheng Liang,Shangyi Sun,Shaonan Ma,Chengying Huan,Lulu Chen,Zhihui Lu,Yang Xu,Ming Yan,Jie Wu
DOI: https://doi.org/10.1145/3673038.3673113
2024-01-01
Abstract:Byte-addressable Persistent Storage (BPS), such as persistent memory and CXL-enabled SSDs, has become an extension of main memory. This opens up new possibilities for indexes that operate and persist data directly on the memory bus. Recent learned indexes exploit data distribution and have shown great potential for some workloads. Despite some work proposed for integrating learned indexes into BPS, they are mainly based on Intel's first-generation persistent memory. The current design suffers from the following problems: 1) Excessive storage line accesses due to large node in learned indexes; 2) Inefficient concurrency control due to volatile cache; 3) Write amplification due to mismatch access granularity. To resolve these challenges, we introduce a new learned index named PFLX, featuring three key design improvements: 1) a Storage Line-Friendly Node Layout that enhances search and insertion operations and minimizes storage line accesses through strategic use of pointers; 2) a Persistent Cache-based Lock-free Concurrency Mechanism that utilizes atomic primitives to serialize operations in leaf nodes effectively; 3) a Selective Data Flush Mechanism designed to reduce write amplification. Evaluations show that PFLX outperforms the state-of-the-art persistent indexes by 1.2 similar to 3x.
What problem does this paper attempt to address?