RegTT: Accelerating Tree Traversals on GPUs by Exploiting Regularities

Feng Zhang,Peng Di,Hao Zhou,Xiangke Liao,Jingling Xue
DOI: https://doi.org/10.1109/icpp.2016.71
2016-01-01
Abstract:Tree traversals are widely used irregular applications. Given a tree traversal algorithm, where a single tree is traversed by multiple queries (with truncation), its efficient parallelization on GPUs is hindered by branch divergence, load imbalance and memory-access irregularity, as the nodes and their visitation orders differ greatly under different queries.We leverage a key insight made on several truncation-induced tree traversal regularities to enable as many threads in the same warp as possible to visit the same node simultaneously, thereby enhancing both GPU resource utilization and memory coalescing at the same time. We introduce a new parallelization approach, REGTT, to orchestrate an efficient execution of a tree traversal algorithm on GPUs by starting with BFT (BreadthFirst Traversal), then reordering the queries being processed (based on their truncation histories), and finally, switching to DFT (Depth-First Traversal). REGTT is general (without relying on domain-specific knowledge) and automatic (as a source-code transformation). For a set of five representative benchmarks used, REGTT outperforms the state-of-the-art by 1.66x on average.
What problem does this paper attempt to address?