Enabling Efficient Random Access to Hierarchically Compressed Text Data on Diverse GPU Platforms
Yihua Hu,Feng Zhang,Yifei Xia,Zhiming Yao,Letian Zeng,Haipeng Ding,Zhewei Wei,Xiao Zhang,Jidong Zhai,Xiaoyong Du,Siqi Ma
DOI: https://doi.org/10.1109/tpds.2023.3294341
IF: 5.3
2023-01-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:The tremendous computing capacity of GPU offers significant potential in processing hierarchically compressed text data without decompression. However, current GPU techniques offer only traversal-based text data analytics; random access is exceedingly inefficient, limiting their utility significantly. To address this issue, we develop a novel and widely applicable solution that prompts random access to hierarchically compressed text data without decompression in GPU memory. We address three main challenges for enabling efficient random access to compressed text data on GPUs. The first challenge is designing GPU data structures that facilitate random access. The second challenge is efficiently generating data structures on GPU. The CPU is inefficient when generating data structures for random access, and this inefficiency increases considerably when PCIe transmission is incorporated. The third challenge is query processing on compressed text data in GPU memory. Random accesses, such as data updates, cause massive conflicts among countless threads. In order to address the first challenge, we develop several compressed GPU data structures, including indexing within the intricate GPU memory hierarchy. To handle the second challenge, we propose a two-phase process for producing these data structures on GPU. For the third challenge, a double-parsing design is proposed as a solution to avoid conflicts. We evaluate our solution on three platforms, two server-grade GPU platforms and one edge-grade GPU platform, using five real-world datasets. Experimental results show that random access operations on GPU achieve an average speedup of 52.98× compared to the state-of-the-art solution.