An Adaptive Breadth-First Search Algorithm on Integrated Architectures
Feng Zhang,Heng Lin,Jidong Zhai,Jie Cheng,Dingyi Xiang,Jizhong Li,Yunpeng Chai,Xiaoyong Du
DOI: https://doi.org/10.1007/s11227-018-2525-0
IF: 3.3
2018-01-01
The Journal of Supercomputing
Abstract:In the big data era, graph applications are becoming increasingly important for data analysis. Breadth-first search (BFS) is one of the most representative algorithms; therefore, accelerating BFS using graphics processing units (GPUs) is a hot research topic. However, due to their random data access pattern, it is difficult to take full advantage of the power of GPUs. Recently, hardware designers have integrated CPUs and GPUs on the same chip, allowing both devices to share physical memory, which provides the convenience of switching between CPUs and GPUs with little cost. BFS processing can be divided into several levels, and various traversal orders can be used at each level. Using different traversal orders on different devices (CPUs or GPUs) results in diverse performances. Thus, the challenge in using BFS on integrated architectures is how to select the traversal order and the device for each level. Previous works have failed to address this problem effectively. In this study, we propose an adaptive performance model that automatically finds a suitable traversal order and device for each level. We evaluated our method on Graph500, where it not only shows the best energy efficiency but also achieves a giga-traversed edges per second (GTEPS) performance of approximately 2.1 GTEPS, which is a \(2.3\,\times \) speed improvement over the state-of-the-art BFS on integrated architectures.