Enabling Efficient NVM-Based Text Analytics Without Decompression

Xiaokun Fang,Feng Zhang,Junxiang Nong,Mingxing Zhang,Puyun Hu,Yunpeng Chai,Xiaoyong Du
DOI: https://doi.org/10.1109/icde60146.2024.00286
2024-01-01
Abstract:Text analytics directly on compression (TADOC) is a promising technology designed for handling big data analytics. However, a substantial amount of DRAM is required for high performance, which limits its usage in many important scenarios where the capacity of DRAM is limited, such as memory-constrained systems. Non-volatile memory (NVM) is a novel storage technology that combines the advantage of reading per-formance and byte addressability of DRAM with the durability of traditional storage devices like SSD and HDD. Unfortunately, no research demonstrates how to use NVM to reduce DRAM utilization in compressed data analytics. In this paper, we propose N-TADOC, which substitutes DRAM with NVM while maintaining TADOC's analytics performance and space savings. Utilizing an NVM block device to reduce DRAM utilization presents two challenges, including poor data locality in traversing datasets and auxiliary data structure reconstruction on NVM. We develop novel designs to solve these challenges, including a pruning method with NVM pool management, bottom-up upper bound estimation, correspondent data structures, and persistence strategy at different levels of cost. Experimental results show that on four real-world datasets, N-TADOC achieves 2.04× performance speedup compared to the processing directly on the uncompressed data and 70.7% DRAM space saving compared to the original TADOC.
What problem does this paper attempt to address?