Abstract:With the increasing demand for computing capability given limited resource and power budgets, it is crucial to deploy applications to customized accelerators like FPGAs. However, FPGA programming is non-trivial. Although existing high-level synthesis (HLS) tools improve productivity to a certain extent, they are limited in scope and capability to support sufficient FPGA-oriented optimizations. This paper focuses on FPGA-based accelerators and proposes POM, an optimizing framework built on multi-level intermediate representation (MLIR). POM has several features which demonstrate its scope and capability of performance optimization. First, most HLS tools depend exclusively on a single-level IR to perform all the optimizations, introducing excessive information into the IR and making debugging an arduous task. In contrast, POM introduces three layers of IR to perform operations at suitable abstraction levels, streamlining the implementation and debugging process and exhibiting better flexibility, extensibility, and systematicness. Second, POM integrates the polyhedral model into MLIR, enabling advanced dependence analysis and various FPGA-oriented loop transformations. By representing nested loops with integer sets and maps, loop transformations can be conducted conveniently through manipulations on polyhedral semantics. Finally, to further relieve design effort, POM has a user-friendly programming interface (DSL) that allows a concise description of computation and includes a rich collection of scheduling primitives. An automatic design space exploration (DSE) engine is provided to search for high-performance optimization schemes efficiently and generate optimized accelerators automatically. Experimental results show that POM achieves a $6.46\times$ average speedup on typical benchmark suites and a $6.06\times$ average speedup on real-world applications compared to the state-of-the-art.

An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning Accelerators

An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning Accelerators

A Near Memory Computing FPGA Architecture for Neural Network Acceleration

A Versatile Acceleration Framework for Machine Learning Algorithms

CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs

Software-defined Design Space Exploration for an Efficient DNN Accelerator Architecture

A Heterogeneous Full-stack AI Platform for Performance Monitoring and Hardware-specific Optimizations

An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation

PuDianNao: A Polyvalent Machine Learning Accelerator

Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs

Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network Accelerators

A Ubiquitous Machine Learning Accelerator With Automatic Parallelization on FPGA

Survey and design of paleozoic: a high-performance compiler tool chain for deep learning inference accelerator

Deep Learning Accelerators' Configuration Space Exploration Effect on Performance and Resource Utilization: A Gemmini Case Study

Efficient Hardware Optimization Strategies For Deep Neural Networks Acceleration Chip

Design Space Exploration of FPGA-based Accelerators with Multi-Level Parallelism

A Small-Footprint Accelerator for Large-Scale Neural Networks

Automatic generation of efficient accelerators for reconfigurable hardware

A Formalism of DNN Accelerator Flexibility

A Hardware-Software Blueprint for Flexible Deep Learning Specialization

Learned Hardware/Software Co-Design of Neural Accelerators