Abstract:The performance bottlenecks of graph applications depend not only on the algorithm and the underlying hardware, but also on the size and structure of the input graph. As a result, programmers must try different combinations of a large set of techniques, which make tradeoffs among locality, work-efficiency, and parallelism, to develop the best implementation for a specific algorithm and type of graph. Existing graph frameworks and domain specific languages (DSLs) lack flexibility, supporting only a limited set of optimizations. This paper introduces GraphIt, a new DSL for graph computations that generates fast implementations for algorithms with different performance characteristics running on graphs with different sizes and structures. GraphIt separates what is computed (algorithm) from how it is computed (schedule). Programmers specify the algorithm using an algorithm language, and performance optimizations are specified using a separate scheduling language. The algorithm language simplifies expressing the algorithms, while exposing opportunities for optimizations. We formulate graph optimizations, including edge traversal direction, data layout, parallelization, cache, NUMA, and kernel fusion optimizations, as tradeoffs among locality, parallelism, and work-efficiency. The scheduling language enables programmers to easily search through this complicated tradeoff space by composing together a large set of edge traversal, vertex data layout, and program structure optimizations. The separation of algorithm and schedule also enables us to build an autotuner on top of GraphIt to automatically find high-performance schedules. The compiler uses a new scheduling representation, the graph iteration space, to model, compose, and ensure the validity of the large number of optimizations. We evaluate GraphIt’s performance with seven algorithms on graphs with different structures and sizes. GraphIt outperforms the next fastest of six state-of-the-art shared-memory frameworks (Ligra, Green-Marl, GraphMat, Galois, Gemini, and Grazelle) on 24 out of 32 experiments by up to 4.8×, and is never more than 43% slower than the fastest framework on the other experiments. GraphIt also reduces the lines of code by up to an order of magnitude compared to the next fastest framework.

Shogun: A Task Scheduling Framework for Graph Mining Accelerators.

FINGERS: exploiting fine-grained parallelism in graph mining accelerators

TuNao: A High-Performance and Energy-Efficient Reconfigurable Accelerator for Graph Processing

DyGA: A Hardware-Efficient Accelerator with Traffic-Aware Dynamic Scheduling for Graph Convolutional Networks.

Grapher: A Reconfigurable Graph Computing Accelerator with Optimized Processing Elements

An optimized architecture for accelerating graph computing on FPGAs

Response Time Analysis of Parallel Tasks on Accelerator-Based Heterogeneous Platforms

DIMMining: Pruning-Efficient and Parallel Graph Mining on Near-Memory-Computing

A Ubiquitous Machine Learning Accelerator With Automatic Parallelization on FPGA

Lattice-based Scheduling for Multi-FPGA Systems

GraphIt: a high-performance graph DSL

Exploiting the Task-Pipelined Parallelism of Stream Programs on Many-Core GPUs.

HyTGraph: GPU-Accelerated Graph Processing with Hybrid Transfer Management

HSIP: A Novel Task Scheduling Algorithm for Heterogeneous Computing

An Optimal Locality-Aware Task Scheduling Algorithm Based on Bipartite Graph Modelling for Spark Applications

Accelerate Solving Expensive Scheduling by Leveraging Economical Auxiliary Tasks

NO2: Speeding Up Parallel Processing of Massive Compute-Intensive Tasks

Hypergraph-partitioning-based online joint scheduling of tasks and data

Accelerating Graph Analytics by Co-Optimizing Storage and Access on an FPGA-HMC Platform

OmniGraph: A Scalable Hardware Accelerator for Graph Processing

Streaming Task Graph Scheduling for Dataflow Architectures