Abstract:Recently, algorithm-hardware co-exploration for neural networks (NNs) has become the key to obtaining high-quality solutions. However, previous efforts for FPGAs focus on neural architecture search (NAS) while lacking hardware architecture search (HAS), thus limiting the full potential of co-design. Although expanding the scope of HAS offers performance potential, the exponentially increased joint search space presents a formidable challenge. To address this, we propose a deep and efficient framework, which jointly searches for Networks and Accelerators for FPGAs in a balanced co-search space. First, we adjust the NAS space and then introduce a block-level bitwidth search on the software side. Meanwhile, we design a hardware-friendly quantization algorithm to facilitate hardware efficiency and accuracy. Second, we design a dataflow-configurable hardware unit with computation and memory access optimizations for quantized multiplication. Based on this, we incorporate critical heterogeneous multicore architecture exploration on the hardware side. Third, to enable rapid hardware feedback in the enlarged HAS space, we perform resource and performance modeling and design a fast hardware generation algorithm based on the genetic algorithm. Specifically, we apply optimization techniques, like mapping space pruning, greedy bandwidth allocation, and coarse-grained search, to speed up this process. We validate in edge and cloud scenarios. Experimental results show that efficiently explores a significantly larger joint space and provides high-quality solutions. Compared with previous state-of-the-art co-design works, the searched CNN-accelerator pairs improve the throughput by 2.07× ~ 7.10× and energy efficiency by 1.41× ~ 2.27× under similar accuracy on the ImageNet dataset.

Invited: Algorithm-Software-Hardware Co-Design for Deep Learning Acceleration

A Near Memory Computing FPGA Architecture for Neural Network Acceleration

A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability.

Recent Advances in Efficient Computation of Deep Convolutional Neural Networks

Efficient Hardware Optimization Strategies For Deep Neural Networks Acceleration Chip

Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework

HAO: Hardware-aware neural Architecture Optimization for Efficient Inference

Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations

Design Automation for Efficient Deep Learning Computing

Learned Hardware/Software Co-Design of Neural Accelerators

Algorithm and Hardware Co-design for Reconfigurable CNN Accelerator

Designing Deep Learning Hardware Accelerator and Efficiency Evaluation

Towards Agile DNN Accelerator Design Using Incremental Synthesis on FPGAs

Adaptive design and implementation of automatic modulation recognition accelerator

Power-Driven DNN Dataflow Optimization on FPGA

Software-defined Design Space Exploration for an Efficient DNN Accelerator Architecture

Co-Designing Binarized Transformer and Hardware Accelerator for Efficient End-to-End Edge Deployment

Software-Hardware Codesign for Efficient Neural Network Acceleration

[DL] A Survey of FPGA-based Neural Network Inference Accelerators

Unleashing Network/Accelerator Co-Exploration Potential on FPGAs: A Deeper Joint Search

WPU: A FPGA-based Scalable, Efficient and Software/Hardware Co-design Deep Neural Network Inference Acceleration Processor