Design of a Generic Dynamically Reconfigurable Convolutional Neural Network Accelerator with Optimal Balance

Haoran Tong,Ke Han,Si Han,Yingqi Luo

DOI: https://doi.org/10.3390/electronics13040761

IF: 2.9

2024-02-15

Electronics

Abstract:In many scenarios, edge devices perform computations for applications such as target detection and tracking, multimodal sensor fusion, low-light image enhancement, and image segmentation. There is an increasing trend of deploying and running multiple different network models on one hardware platform, but there is a lack of generic acceleration architectures that support standard convolution (CONV), depthwise separable CONV, and deconvolution (DeCONV) layers in such complex scenarios. In response, this paper proposes a more versatile dynamically reconfigurable CNN accelerator with a highly unified computing scheme. The proposed design, which is compatible with standard CNNs, lightweight CNNs, and CNNs with DeCONV layers, further improves the resource utilization and reduces the gap of efficiency when deploying different models. Thus, the hardware balance during the alternating execution of multiple models is enhanced. Compared to a state-of-the-art CNN accelerator, Xilinx DPU B4096, our optimized architecture achieves resource utilization improvements of 1.08× for VGG16 and 1.77× for MobileNetV1 in inference tasks on the Xilinx ZCU102 platform. The resource utilization and efficiency degradation between these two models are reduced to 59.6% and 63.7%, respectively. Furthermore, the proposed architecture can properly run DeCONV layers and demonstrates good performance.

engineering, electrical & electronic,computer science, information systems,physics, applied

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that when deploying and running a variety of different network models on edge devices, there is a lack of a general - purpose acceleration architecture to support the efficient processing of standard convolution (CONV), depthwise separable convolution (Depthwise Separable CONV) and deconvolution (DeCONV) layers. Specifically, existing hardware accelerators have problems of low resource utilization and unbalanced efficiency when processing different types of convolution layers. Especially when multiple models are executed alternately, the system workload fluctuates greatly, resulting in it being difficult for the fixed hardware architecture to adapt to such dynamically changing application requirements. To meet this challenge, the paper proposes a highly unified computing scheme and designs a general - purpose and dynamically reconfigurable convolutional neural network (CNN) accelerator. This accelerator can be compatible with standard CNNs, lightweight CNNs and CNN models containing DeCONV layers, further improving resource utilization and reducing the efficiency gap between different models. By optimizing the hardware architecture, the paper aims to improve the hardware balance, especially in complex application scenarios, and enhance the compatibility and energy - efficiency balance during multi - model deployment.

Design of a Generic Dynamically Reconfigurable Convolutional Neural Network Accelerator with Optimal Balance

A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability.

A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network

An Efficient Accelerator for Multiple Convolutions From the Sparsity Perspective

A High Efficient Architecture for Convolution Neural Network Accelerator

A Reconfigurable Accelerator for Sparse Convolutional Neural Networks.

Myocarditis: A clinical entity that can benefit from noninvasive imaging

A High-Efficient and Configurable Hardware Accelerator for Convolutional Neural Network

A High Utilization FPGA-Based Accelerator for Variable-Scale Convolutional Neural Network

A high-speed reusable quantized hardware accelerator design for CNN on constrained edge device

A FPGA-based Hardware Accelerator for Multiple Convolutional Neural Networks

Efficient Hardware Architectures for Deep Convolutional Neural Network

An Efficient Streaming Accelerator for Low Bit-Width Convolutional Neural Networks

A Flexible and Energy-Efficient Convolutional Neural Network Acceleration with Dedicated ISA and Accelerator

A Parallel Loading Based Accelerator for Convolution Neural Network

Improving HW/SW Adaptability for Accelerating CNNs on FPGAs Through A Dynamic/Static Co-Reconfiguration Approach

Energy-Efficient Accelerator Design for Deformable Convolution Networks

A Reconfigurable Process Engine for Flexible Convolutional Neural Network Acceleration

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

Design of a Convolutional Neural Network Accelerator Based on On-Chip Data Reordering