HIR: An MLIR-based Intermediate Representation for Hardware Accelerator Description

Kingshuk Majumder,Uday Bondhugula

DOI: https://doi.org/10.48550/arXiv.2103.00194

2021-02-27

Abstract:The emergence of machine learning, image and audio processing on edge devices has motivated research towards power efficient custom hardware accelerators. Though FPGAs are an ideal target for energy efficient custom accelerators, the difficulty of hardware design and the lack of vendor agnostic, standardized hardware compilation infrastructure has hindered their adoption. This paper introduces HIR, an MLIR-based intermediate representation (IR) to describe hardware accelerator designs. HIR combines high level language features, such as loops and multi-dimensional tensors, with programmer defined explicit scheduling, to provide a high-level IR suitable for DSL compiler pipelines without compromising control over the micro-architecture of the accelerator. HIR's explicit schedules allow it to express fine-grained, synchronization-free parallelism and optimizations such as retiming and pipelining. Built as a dialect in MLIR, it draws from best IR practices learnt from communities like those of LLVM. While offering rich optimization opportunities and a high level abstraction, HIR enables sharing of optimizations, utilities and passes with software compiler infrastructure. Our implementation shows that the code generation time of the HIR code generator is on average 1112x lower than that of Xilinx Vivado HLS on a range of kernels without a compromise on the quality of the generated hardware. We believe that these are significant steps forward in the design of IRs for hardware synthesis and in equipping domain-specific languages with a productive and performing compilation path to custom hardware acceleration.

Hardware Architecture,Programming Languages

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges faced in the design of high - performance, low - power custom hardware accelerators on edge devices. Specifically, although FPGA (Field - Programmable Gate Array) is an ideal target for implementing energy - efficient custom accelerators, the difficulty of hardware design and the lack of a vendor - neutral standardized hardware compilation infrastructure have hindered its wide application. For this reason, this paper proposes an intermediate representation (IR) based on MLIR (Multi - Level Intermediate Representation) - HIR, which is used to describe hardware accelerator designs. HIR combines high - level language features (such as loops and multi - dimensional tensors) with programmer - defined explicit scheduling, providing a high - level IR suitable for the DSL (Domain - Specific Language) compiler pipeline without sacrificing control over the accelerator micro - architecture. The main features of HIR include: - **Explicit Scheduling**: Allows the expression of fine - grained, synchronization - free parallelism and optimizations (such as retiming and pipelining). - **Rich Optimization Opportunities**: Provides rich optimization opportunities and can share optimizations, tools, and passes with the software compiler infrastructure. - **High - Performance Code Generation**: Experiments show that the code generation time of the HIR code generator is on average 1,112 times lower than that of Xilinx Vivado HLS without affecting the quality of the generated hardware. Through these features, HIR aims to provide significant progress for the IR of hardware synthesis design and provide an efficient and well - performing compilation path to custom hardware acceleration for domain - specific languages.

HIR: An MLIR-based Intermediate Representation for Hardware Accelerator Description

LLHD: A Multi-level Intermediate Representation for Hardware Description Languages

HECTOR: A Multi-Level Intermediate Representation for Hardware Synthesis Methodologies.

Automatic multi-dimensional pipelining for high-level synthesis of dataflow accelerators

An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation

Instruction-Level Abstraction (ILA): A Uniform Specification for System-on-Chip (SoC) Verification

Platform-Aware FPGA System Architecture Generation based on MLIR

Compiling Halide Programs to Push-Memory Accelerators

AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators

RapidStream IR: Infrastructure for FPGA High-Level Physical Synthesis

Programming Heterogeneous Systems from an Image Processing DSL

ML-driven Hardware Cost Model for MLIR

R-HLS: An IR for Dynamic High-Level Synthesis and Memory Disambiguation based on Regions and State Edges

Towards a high-performance AI compiler with upstream MLIR

Data Transfer Optimizations for Host-CPU and Accelerators in AXI4MLIR

Building a Reusable and Extensible Automatic Compiler Infrastructure for Reconfigurable Devices

FDRA: A Framework for a Dynamically Reconfigurable Accelerator Supporting Multi-Level Parallelism

GAHLS: an optimized graph analytics based high level synthesis framework

MLIR-based code generation for GPU tensor cores

AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and Compilers

Hector: Multi-level Paradigm in Hardware Synthesis