Abstract:Mixed-precision neural networks (MPNNs) that enable the use of just enough data width for a deep learning task promise significant advantages of both inference accuracy and computing overhead. FPGAs with fine-grained reconfiguration capability can adapt the processing with distinct data width and models, and hence, can theoretically unleash the potential of MPNNs. Nevertheless, commodity DPUs on FPGAs mostly emphasize generality and have limited support for MPNNs especially the ones with lower data width. In addition, primitive DSPs in FPGAs usually have much larger data width than that is required by MPNNs and haven't been sufficiently co-explored with MPNNs yet. To this end, we propose an open source MPNN accelerator design framework specifically tailored for FPGAs. In this framework, we have a systematic DSP-packing algorithm to pack multiple lower data width MACs in a single primitive DSP and enable efficient implementation of MPNNs. Meanwhile, we take DSP packing efficiency into consideration with MPNN quantization within a unified neural network architecture search (NAS) framework such that it can be aware of the DSP overhead during quantization and optimize the MPNN performance and accuracy concurrently. Finally, we have the optimized MPNN fine-tuned to a fully pipelined neural network accelerator template based on HLS and make best use of available resources for higher performance. Our experiments reveal the resulting accelerators produced by the proposed framework can achieve overwhelming advantages in terms of performance, resource utilization, and inference accuracy for MPNNs when compared with both handcrafted counterparts and prior hardware-aware neural network accelerators on FPGAs.

Ifpna: A Flexible and Efficient Deep Neural Network Accelerator with a Programmable Data Flow Engine in 28nm CMOS.

Ifpna: A Flexible and Efficient Deep Learning Processor in 28-Nm CMOS Using a Domain-Specific Instruction Set and Reconfigurable Fabric.

A Near Memory Computing FPGA Architecture for Neural Network Acceleration

High-performance Reconfigurable DNN Accelerator on a Bandwidth-limited Embedded System

Exploring the Programmability for Deep Learning Processors: from Architecture to Tensorization

A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability.

A 28nm Configurable Asynchronous SNN Accelerator with Energy-Efficient Learning

FP-DNN: an Automated Framework for Mapping Deep Neural Networks Onto FPGAs with RTL-HLS Hybrid Templates

FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters

A Data-Driven Asynchronous Neural Network Accelerator

Neural Synaptic Plasticity-Inspired Computing: A High Computing Efficient Deep Convolutional Neural Network Accelerator

Field-Programmable Deep Neural Network (DNN) Learning and Inference accelerator: a concept

DeepBurning-MixQ: An Open Source Mixed-Precision Neural Network Accelerator Design Framework for FPGAs

An Efficient Accelerator for Sparse Convolutional Neural Networks

A Convolutional Neural Network Accelerator Based on FPGA

An All-Digital Compute-In-Memory FPGA Architecture for Deep Learning Acceleration

A 16 nJ/Classification FPGA-Based Wired-Logic DNN Accelerator Using Fixed-Weight Non-Linear Neural Net

A flexible dataflow CNN accelerator on FPGA

A Small-Footprint Accelerator for Large-Scale Neural Networks

SPAT: FPGA-based Sparsity-Optimized Spiking Neural Network Training Accelerator with Temporal Parallel Dataflow

A Conv‐GEMM reconfigurable accelerator with WS‐RS dataflow for high throughput processing