Breadth-First Search with A Multi-Core Computer.

Maryia Belova,Ming Ouyang
DOI: https://doi.org/10.1109/ipdpsw.2017.48
2017-01-01
Abstract:Breadth-first search is a building block of many graph algorithms. Because BFS is memory-bound, parallelizing BFS on a multi-core computer must consider issues of data hazards, effects of atomic operations on memory throughput, and the size of the last level cache. Additionally, graph algorithms must cope with non-sequential memory access, which defeats cache prefetching and leads to a high cache miss rate. This article describes how to limit the maximum size of the data structure, how to perform parallel BFS without atomic operations, how to increase the proportion of sequential memory access, and how to reduce cache contention. These techniques have been used in various forms in the literature. The present work puts them together in a simple way that works well. Three leading platforms of graph algorithms - Gunrock, Ligra, and Polymer - are used for comparison. When executed on the same machine, Ligra is the fastest among the three. The implementation described herein is always faster than Ligra, and is more than twice as fast for large graphs. In particular, for the graph RMat26, it is 3.11 times the speed of Ligra.
What problem does this paper attempt to address?