Abstract:Existing FPGA-based DNN accelerators typically fall into two design paradigms. Either they adopt a generic reusable architecture to support different DNN networks but leave some performance and efficiency on the table because of the sacrifice of design specificity. Or they apply a layer-wise tailor-made architecture to optimize layer-specific demands for computation and resources but loose the scalability of adaptation to a wide range of DNN networks. To overcome these drawbacks, this paper proposes a novel FPGA-based DNN accelerator design paradigm and its automation tool, called DNNExplorer, to enable fast exploration of various accelerator designs under the proposed paradigm and deliver optimized accelerator architectures for existing and emerging DNN networks. Three key techniques are essential for DNNExplorer's improved performance, better specificity, and scalability, including (1) a unique accelerator design paradigm with both high-dimensional design space support and fine-grained adjustability, (2) a dynamic design space to accommodate different combinations of DNN workloads and targeted FPGAs, and (3) a design space exploration (DSE) engine to generate optimized accelerator architectures following the proposed paradigm by simultaneously considering both FPGAs' computation and memory resources and DNN networks' layer-wise characteristics and overall complexity. Experimental results show that, for the same FPGAs, accelerators generated by DNNExplorer can deliver up to 4.2x higher performances (GOP/s) than the state-of-the-art layer-wise pipelined solutions generated by DNNBuilder for VGG-like DNN with 38 CONV layers. Compared to accelerators with generic reusable computation units, DNNExplorer achieves up to 2.0x and 4.4x DSP efficiency improvement than a recently published accelerator design from academia (HybridDNN) and a commercial DNN accelerator IP (Xilinx DPU), respectively.

FP-DNN: an Automated Framework for Mapping Deep Neural Networks Onto FPGAs with RTL-HLS Hybrid Templates

A Near Memory Computing FPGA Architecture for Neural Network Acceleration

High-performance Reconfigurable DNN Accelerator on a Bandwidth-limited Embedded System

FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters

Mapping Large-Scale DNNs on Asymmetric FPGAs: (abstract Only).

DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-based DNN Accelerator

An Efficient Mapping Approach To Large-Scale Dnns On Multi-Fpga Architectures

DeepBurning-MixQ: An Open Source Mixed-Precision Neural Network Accelerator Design Framework for FPGAs

A Power Efficient Neural Network Implementation on Heterogeneous FPGA and GPU Devices

Toward Full-Stack Acceleration of Deep Convolutional Neural Networks on FPGAs

A FPGA-based end-to-end acceleration framework for fast deployment of Convolutional Neural Networks

FP-BNN: Binarized neural network on FPGA

HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation.

An All-Digital Compute-In-Memory FPGA Architecture for Deep Learning Acceleration

Exploring the Programmability for Deep Learning Processors: from Architecture to Tensorization

FPNet: Customized Convolutional Neural Network for FPGA Platforms

Achieving Super-Linear Speedup across Multi-FPGA for Real-Time DNN Inference

A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA

A Flexible DNN Accelerator Design with Layer Pipeline for FPGAs

Tactics to Directly Map CNN Graphs on Embedded FPGAs

LCP: a Layer Clusters Paralleling Mapping Method for Accelerating Inception and Residual Networks on FPGA