Allo: A Programming Model for Composable Accelerator Design

Hongzheng Chen,Niansong Zhang,Shaojie Xiang,Zhichen Zeng,Mengjia Dai,Zhiru Zhang
DOI: https://doi.org/10.1145/3656401
2024-04-07
Abstract:Special-purpose hardware accelerators are increasingly pivotal for sustaining performance improvements in emerging applications, especially as the benefits of technology scaling continue to diminish. However, designers currently lack effective tools and methodologies to construct complex, high-performance accelerator architectures in a productive manner. Existing high-level synthesis (HLS) tools often require intrusive source-level changes to attain satisfactory quality of results. Despite the introduction of several new accelerator design languages (ADLs) aiming to enhance or replace HLS, their advantages are more evident in relatively simple applications with a single kernel. Existing ADLs prove less effective for realistic hierarchical designs with multiple kernels, even if the design hierarchy is flattened.
Programming Languages,Hardware Architecture,Machine Learning
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on two aspects: 1. **Balancing Manual Control and Automated Compilation Optimization**: - In current hardware accelerator designs, although manually creating high - performance kernels can achieve high efficiency, it requires a great deal of manual effort for design and verification. These kernels usually depend on specific data types and function signatures, which limits their adaptability to rapidly evolving applications and hardware advancements. - Automated compilation techniques such as polyhedral compilation can generate on - chip buffers, stream data - flow architectures, or systolic arrays from simple C/C++ code. However, these tools usually do not provide sufficient control to explore different performance/cost trade - offs and customize the memory hierarchy and communication schemes to adapt to new applications. 2. **Transition from Single - Kernel Optimization to Complex Multi - Kernel Accelerator Design**: - Existing hardware design languages (ADLs) mainly focus on optimizing individual application kernels, such as convolution and matrix multiplication. For real - world multi - kernel applications, these ADLs tend to generate flattened monolithic designs, ignoring the problems of interface incompatibility or optimization conflicts that may be brought about by the combination between different kernels. This lack of support for composability impairs modularity, debuggability, and often leads to sub - optimal performance. To solve these problems, the paper proposes Allo, a composable programming model for efficient spatial accelerator design. The main design principles of Allo include providing decoupled hardware customization primitives, modularizing the accelerator design process, and promoting type - safe combination of individual components. Through these methods, Allo aims to improve the productivity and maintainability of developing high - performance accelerators.