Abstract:Existing FPGA-based DNN accelerators typically fall into two design paradigms. Either they adopt a generic reusable architecture to support different DNN networks but leave some performance and efficiency on the table because of the sacrifice of design specificity. Or they apply a layer-wise tailor-made architecture to optimize layer-specific demands for computation and resources but loose the scalability of adaptation to a wide range of DNN networks. To overcome these drawbacks, this paper proposes a novel FPGA-based DNN accelerator design paradigm and its automation tool, called DNNExplorer, to enable fast exploration of various accelerator designs under the proposed paradigm and deliver optimized accelerator architectures for existing and emerging DNN networks. Three key techniques are essential for DNNExplorer's improved performance, better specificity, and scalability, including (1) a unique accelerator design paradigm with both high-dimensional design space support and fine-grained adjustability, (2) a dynamic design space to accommodate different combinations of DNN workloads and targeted FPGAs, and (3) a design space exploration (DSE) engine to generate optimized accelerator architectures following the proposed paradigm by simultaneously considering both FPGAs' computation and memory resources and DNN networks' layer-wise characteristics and overall complexity. Experimental results show that, for the same FPGAs, accelerators generated by DNNExplorer can deliver up to 4.2x higher performances (GOP/s) than the state-of-the-art layer-wise pipelined solutions generated by DNNBuilder for VGG-like DNN with 38 CONV layers. Compared to accelerators with generic reusable computation units, DNNExplorer achieves up to 2.0x and 4.4x DSP efficiency improvement than a recently published accelerator design from academia (HybridDNN) and a commercial DNN accelerator IP (Xilinx DPU), respectively.

Unleashing Network/Accelerator Co-Exploration Potential on FPGAs: A Deeper Joint Search

Unleashing Network/Accelerator Co-Exploration Potential on FPGAs: A Deeper Joint Search

NAF: Deeper Network/Accelerator Co-Exploration for Customizing CNNs on FPGA

A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability.

Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks

HAO: Hardware-aware neural Architecture Optimization for Efficient Inference

Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators

WPU: A FPGA-based Scalable, Efficient and Software/Hardware Co-design Deep Neural Network Inference Acceleration Processor

DNA: Differentiable Network-Accelerator Co-Search

A Ubiquitous Machine Learning Accelerator With Automatic Parallelization on FPGA

Algorithm and Hardware Co-design for Reconfigurable CNN Accelerator

Design Exploration of Multi-FPGAs for Accelerating Deep Learning

DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-based DNN Accelerator

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

Toward Full-Stack Acceleration of Deep Convolutional Neural Networks on FPGAs

Device-Circuit-Architecture Co-Exploration for Computing-in-Memory Neural Accelerators

Beyond Training: A Zero-Shot Framework to Neural Architecture and Accelerator Co-Exploration

Invited: Algorithm-Software-Hardware Co-Design for Deep Learning Acceleration

DeepBurning-MixQ: An Open Source Mixed-Precision Neural Network Accelerator Design Framework for FPGAs

A Survey of FPGA Based Deep Learning Accelerators: Challenges and Opportunities

Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture Search