Abstract:Accurate and efficient Machine Learning algorithms are of vital importance to many problems, especially on classification or clustering tasks but need a universal AI model standard. Unifying machine learning models into a common ecosystem can lead to less development time and better framework interoperability. ONNX (Open Neural Network Exchange Format) is a popular open format to represent deep learning models so that AI developers can more easily move models between state-of-the-art tools. On top of that, hardware companies such as Nvidia or Intel try to keep up with this trend and produce hardware-optimized runtimes (i.e. for CPUs, GPUs, FPGAs) that can handle these open format AI models like ONNX. That enables developers to leverage an heterogeneous mix of hardware and use whichever AI framework they prefer. However, FPGAs have a more challenging solution strategy which as a platform it is also proven to address these kind of problems very efficiently in terms of performance and power. This work is based on an early development stage project which is called HLS4ML originally created for particle physics applications via the automatic generation of neural networks (NNs) for embedded Xilinx FPGAs. Our work involves a hardware-aware NN training and a generalized optimization scheme on top of HLS4ML that boosts the performance and power efficiency of this package and adds functionality for cloud FPGA firmware from any NN model. We start from the FPGA-oriented training of a model in Keras for image recognition, converting into the ONNX open format then porting and optimizing it for cloud FPGAs using a novel scheme with optimizations in host, memory and kernels while using multiple levels of network precision. To the best of our knowledge this is a novel approach that also achieves a speed-up of up to 102<math>×</math> over single CPU in performance and up to 5.5<math>×</math> over GPU in performance/watt.

ONNX-to-Hardware Design Flow for the Generation of Adaptive Neural-Network Accelerators on FPGAs

ONNX-to-Hardware Design Flow for Adaptive Neural-Network Inference on FPGAs

A Near Memory Computing FPGA Architecture for Neural Network Acceleration

An Automated Design Flow for Adaptive Neural Network Hardware Accelerators

Utilizing cloud FPGAs towards the open neural network standard

Learning on Hardware: A Tutorial on Neural Network Accelerators and Co-Processors

Hardware-Aware Graph Neural Network Automated Design for Edge Computing Platforms

A Survey of FPGA-Based Neural Network Accelerator

Optimizing Neural Network Inference in Edge Robotics by Harnessing FPGA Hardware Acceleration

Towards Agile DNN Accelerator Design Using Incremental Synthesis on FPGAs

An Automated Hardware Design Framework for Various DNNs Based on ChatGPT

HAO: Hardware-aware neural Architecture Optimization for Efficient Inference

E3NE: An End-to-End Framework for Accelerating Spiking Neural Networks with Emerging Neural Encoding on FPGAs

NAF: Deeper Network/Accelerator Co-Exploration for Customizing CNNs on FPGA

Design of Network-on-Chip-Based Restricted Coulomb Energy Neural Network Accelerator on FPGA Device

Efficient Neural Networks on the Edge with FPGAs by Optimizing an Adaptive Activation Function

[DL] A Survey of FPGA-based Neural Network Inference Accelerators

Adaptive design and implementation of automatic modulation recognition accelerator

Heterogeneous Systems with Reconfigurable Neuromorphic Computing Accelerators

Toward Full-Stack Acceleration of Deep Convolutional Neural Networks on FPGAs

Unleashing Network/Accelerator Co-Exploration Potential on FPGAs: A Deeper Joint Search