Abstract:Breadth-first search (BFS) is a widely used graph algorithm. It is data-intensive, and the data accesses are random and discontinuous. The data-accessing latency plays an important role in the algorithm's time consumption on shared memory computers, since it can hardly be reduced with processor technologies like dynamic execution of instructions and prefect of data. This work focuses on partitioning computation for BFS on shared memory computers. The goal is to improve data-accessing efficiency and optimize load balance among processors. A data-centric parallel computing model is presented. The model provides a partitioned and hierarchical data-view for each processor, and automatically assigns the computation on each data partition to a set of processors that have same data-view. This computation partitioning mechanism allows applications to minimize data accessing collisions among processors. A BFS equipped with the data-centric computation partitioning mechanism has been implemented. Two strategies are introduced to improve our BFS's performance further. One is to improve vertex -- accessing efficiency by representing status of vertices with bitmap. Another is to improve load balance by adjusting every processor's workload dynamically. The model and the strategies have been evaluated with both real graphs and synthetic graphs. Comparing with the BFS without the data-centric computation partitioning mechanism, the new BFS has achieved 1.8-2.6× speedup. We believe this mechanism is also applicable to other graph applications.

Breadth-First Search with A Multi-Core Computer.

Optimizing Data Accesses for Breadth-First Search on Shared Memory Computers.

Highly Efficient Breadth-First Search on CPU-Based Single-Node System

Accelerating Breadth-First Graph Search on a Single Server by Dynamic Edge Trimming

Fast and Efficient Parallel Breadth-First Search with Power-law Graph Transformation

Scalable Graph Traversal on Sunway TaihuLight with Ten Million Cores

ABi-BFS: A High-performance Parallel Breadth-First Search on Shared-memory Systems

FastBFS: Fast Breadth-First Graph Search on a Single Server.

Scalable Parallel Distributed Coprocessor System for Graph Searching Problems with Massive Data

Understanding Parallelism in Graph Traversal on Multi-Core Clusters

Parallel Cluster-BFS and Applications to Shortest Paths

An Adaptive Breadth-First Search Algorithm on Integrated Architectures

Load-Balanced Breadth-First Search On Gpus

Optimizations to the Parallel Breath First Search on Distributed Memory

Designing and implementing a heuristic cross-architecture combination for graph traversal.

An OpenMP‐based breadth‐first search implementation using the bag data structure

Flexbfs: A Parallelism-Aware Implementation of Breadth-First Search on Gpu

Reducing Communication in Parallel Breadth-First Search on Distributed Memory Systems

TurboBFS: GPU Based Breadth-First Search (BFS) Algorithms in the Language of Linear Algebra

Breadth-first heuristic search