General-purpose GPU Hashing Data Structures and Their Application in Accelerated Genomics

Daniel Juenger,Robin Kobus,Andre Mueller,Christian Hundt,Kai Xu,Weiguo Liu,Bertil Schmidt
DOI: https://doi.org/10.1016/j.jpdc.2022.01.006
IF: 4.542
2022-01-01
Journal of Parallel and Distributed Computing
Abstract:A broad variety of applications relies on associative data structures that exclusively support insert, retrieve, and delete operations. Hash maps represent such a class of effective dictionary implementations. Properties such as amortized constant time complexity for these table operations as well as a compact memory layout make them versatile data structures with manifold applications in data analytics and artificial intelligence. The rapidly growing amount of data emerging in many scientific fields can often only be tackled with modern massively parallel accelerators such as GPUs. Numerous GPU hash table implementations have been proposed over the recent years. However, most of these implementations lack flexibility in order to be used in existing analytics pipelines or suffer from significant performance degradation for certain application scenarios. As a more recent approach, the WarpCore framework aims to alleviate these aforementioned restrictions by placing a focus on both versatility and performance. In this work we reflect the key concepts of the WarpCore library and provide an extensive performance evaluation against the state-of-the-art. We further explore how WarpCore can be used for accelerating two bioinformatics applications (metagenomic classification and k -mer counting) with significant speedups. • Many applications rely on associative data structures such as hash maps. • Growing amounts of data emerging in many fields can often only be tackled by GPUs. • GPU hash maps offer massive speedups over CPU-based implementations. • Using the WarpCore library, we can easily accelerate common bioinformatics pipelines.
What problem does this paper attempt to address?