Towards a General and Efficient Linked-List Hash Table on GPUs.
Lan Gao,Yunlong Xu,Chongyang Xu,Rui Wang,Hailong Yang,Zhongzhi Luan,Depei Qian
DOI: https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00201
2019-01-01
Abstract:Hash table is a fundamental indexing data structure, and extensively used in many applications. Recently, several hash table implementations have been proposed on GPUs. However, they have two pitfalls. Firstly, their synchronization approaches can only ensure the atomicity among individual operations. This makes them can hardly be used in some real-world applications (e.g., online transaction processing, OLTP), which require groups of operations to be executed atomically. Secondly, they use either a GPU thread or a warp of GPU threads to process an operation at a time (named thread-and warp-level execution respectively). The former one incurs more branch divergences, while the latter one requires all threads within a warp to be active. To this end, we (1) propose a readers-writer lock based, bucket-level synchronization to realize the atomicity among both individual and groups of operations, and (2) combine thread-and warp-level executions to optimize system performance. Moreover, we optimize the data structure of linked-list with respect to GPU architectures. Putting them together, we realize a general and efficient linked-list hash table on GPUs, named GEL-HASH. Compared with existing GPU hash table implementations, GEL-HASH can easily support more real-world applications. Moreover, we compare GEL-HASH with the state-of-the-art hash table implementation on GPUs (slab hash) using applications that are well supported by the existing implementations. The results show that, for small data scales (numbers of key-value pairs stored in a hash table) GEL-HASH outperforms slab hash by up to 3.05X, while for large data scales GEL-HASH exhibits less than 20% performance slowdown.