Allo: A Programming Model for Composable Accelerator Design

Hongzheng Chen,Niansong Zhang,Shaojie Xiang,Zhichen Zeng,Mengjia Dai,Zhiru Zhang

DOI: https://doi.org/10.1145/3656401

2024-04-07

Abstract:Special-purpose hardware accelerators are increasingly pivotal for sustaining performance improvements in emerging applications, especially as the benefits of technology scaling continue to diminish. However, designers currently lack effective tools and methodologies to construct complex, high-performance accelerator architectures in a productive manner. Existing high-level synthesis (HLS) tools often require intrusive source-level changes to attain satisfactory quality of results. Despite the introduction of several new accelerator design languages (ADLs) aiming to enhance or replace HLS, their advantages are more evident in relatively simple applications with a single kernel. Existing ADLs prove less effective for realistic hierarchical designs with multiple kernels, even if the design hierarchy is flattened.

Programming Languages,Hardware Architecture,Machine Learning

What problem does this paper attempt to address?

The problems that this paper attempts to solve mainly focus on two aspects: 1. **Balancing Manual Control and Automated Compilation Optimization**: - In current hardware accelerator designs, although manually creating high - performance kernels can achieve high efficiency, it requires a great deal of manual effort for design and verification. These kernels usually depend on specific data types and function signatures, which limits their adaptability to rapidly evolving applications and hardware advancements. - Automated compilation techniques such as polyhedral compilation can generate on - chip buffers, stream data - flow architectures, or systolic arrays from simple C/C++ code. However, these tools usually do not provide sufficient control to explore different performance/cost trade - offs and customize the memory hierarchy and communication schemes to adapt to new applications. 2. **Transition from Single - Kernel Optimization to Complex Multi - Kernel Accelerator Design**: - Existing hardware design languages (ADLs) mainly focus on optimizing individual application kernels, such as convolution and matrix multiplication. For real - world multi - kernel applications, these ADLs tend to generate flattened monolithic designs, ignoring the problems of interface incompatibility or optimization conflicts that may be brought about by the combination between different kernels. This lack of support for composability impairs modularity, debuggability, and often leads to sub - optimal performance. To solve these problems, the paper proposes Allo, a composable programming model for efficient spatial accelerator design. The main design principles of Allo include providing decoupled hardware customization primitives, modularizing the accelerator design process, and promoting type - safe combination of individual components. Through these methods, Allo aims to improve the productivity and maintainability of developing high - performance accelerators.

Allo: A Programming Model for Composable Accelerator Design

An accelerator-aware microarchitecture simulator for design space exploration

AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

DSAGEN: Synthesizing Programmable Spatial Accelerators

AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and Compilers

Improving High Level Synthesis Optimization Opportunity Through Polyhedral Transformations.

Design Space Exploration of FPGA-based Accelerators with Multi-Level Parallelism

Compiling Halide Programs to Push-Memory Accelerators

Lin-analyzer: a high-level performance analysis tool for FPGA-based accelerators.

Enhancing High-Level Synthesis with Automated Pragma Insertion and Code Transformation Framework

DAEGEN: A Modular Compiler for Exploring Decoupled Spatial Accelerators.

An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation

Automatic generation of efficient accelerators for reconfigurable hardware

HeteroFlow: An Accelerator Programming Model with Decoupled Data Placement for Software-Defined FPGAs

A Dynamic Overlay Supporting Just-In-Time Assembly to Construct Customized Hardware Accelerators

HLSPilot: LLM-based High-Level Synthesis

Extending High-Level Synthesis for Task-Parallel Programs

OverGen: Improving FPGA Usability Through Domain-specific Overlay Generation.

Early DSE and Automatic Generation of Coarse Grained Merged Accelerators

A Comprehensive Methodology to Determine Optimal Coherence Interfaces for Many-Accelerator SoCs.

Automatic Hardware Pragma Insertion in High-Level Synthesis: A Non-Linear Programming Approach