Understanding the SIMD Efficiency of Graph Traversal on GPU.

Yichao Cheng,Hong An,Zhitao Chen,Feng Li,Zhaohui Wang,Xia Jiang,Yi Peng
DOI: https://doi.org/10.1007/978-3-319-11197-1_4
2014-01-01
Abstract:Graph is a widely used data structure and graph algorithms, such as breadth-first search (BFS), are regarded as key components in a great number of applications. Recent studies have attempted to accelerate graph algorithms on highly parallel graphics processing unit (GPU). Although many graph algorithms based on large graphs exhibit abundant parallelism, their performance on GPU still faces formidable challenges, one of which is to map the irregular computation onto GPU's vectorized execution model. In this paper, we investigate the link between graph topology and performance of BFS on GPU. We introduce a novel model to analyze the components of SIMD underutilization. We show that SIMD lanes are wasted either due to the workload imbalance between tasks, or to the heterogeneity of each task. We also develop corresponding metrics to quantify the SIMD efficiency for BFS on GPU. Finally, we demonstrate the applicability of the metrics by using them to profile the performance for different mapping strategies.
What problem does this paper attempt to address?