Abstract:Convolution Neural Networks are now widely used in image processing, object detection, video detection, and other classification tasks. Thus the acceleration of CNN is also widely researched for its complex computation features and data dependence. To achieve high energy efficiency, we proposed a CNN accelerator with approximate computing techniques. In this paper, two main aspects are studied: the hardware-compatible network compression algorithms, and the approximate computing units and architectures with hardware resource scheduling strategies. For the algorithm approximation part, we introduce a dynamic layered CNN structure for different scales of input, the convolution kernel shrinking strategy with layer-by-layer quantization to compress networks, and the Winograd Minimum Filter algorithm to decrease operations in convolution layers. For the architecture part, two types of approximate multipliers are innovated as iterative multipliers, and multi-port SRAM integrated LUT based multipliers. Approximate adders with error correction logic are also designed. Based on the approximate computing units, the Convolution Neural Processing Unit named CNPU is proposed with reconfigurable datapath designs for the mapping of different tasks. By the work on the algorithm, the CNPU architecture and the datapath design, we propose a high energy efficient reconfigurable CNN accelerator with approximate computing named ARA (Approximate computing based Reconfigurable Architecture). Implemented under TSMC 45 nm process, our accelerator achieves 1.92TOPS/W@ 1.1 V, 200 MHz and 3.72TOPS/W@ 0.9 V, 40 MHz in energy-efficiency, which is 1.51 ∼ 4.36 times better than the state-of-the-art accelerators.

WRA: A 2.2-to-6.3 TOPS Highly Unified Dynamically Reconfigurable Accelerator Using a Novel Winograd Decomposition Algorithm for Convolutional Neural Networks

EWS: an Energy-Efficient CNN Accelerator with Enhanced Weight Stationary Dataflow

A Reconfigurable Accelerator Based on Fast Winograd Algorithm for Convolutional Neural Network in Internet of Things

A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability.

WRA-MF: A Bit-Level Convolutional-Weight-Decomposition Approach to Improve Parallel Computing Efficiency for Winograd-Based CNN Acceleration

WRA-SS: A High-Performance Accelerator Integrating Winograd with Structured Sparsity for Convolutional Neural Networks

Flexible and Efficient Convolutional Acceleration on Unified Hardware Using the Two-Stage Splitting Method and Layer-Adaptive Allocation of 1-D/2-D Winograd Units

A Reconfigurable Accelerator for Sparse Convolutional Neural Networks.

An FPGA-Based Reconfigurable CNN Training Accelerator Using Decomposable Winograd

A High-Performance Reconfigurable Accelerator for Convolutional Neural Networks.

A Novel Low-Communication Energy-Efficient Reconfigurable CNN Acceleration Architecture

The Storage Structure of Convolutional Neural Network Reconfigurable Accelerator Based on ASIC

WinTA: an Efficient Reconfigurable CNN Training Accelerator with Decomposition Winograd

WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs

A Conv‐GEMM reconfigurable accelerator with WS‐RS dataflow for high throughput processing

Design of a Generic Dynamically Reconfigurable Convolutional Neural Network Accelerator with Optimal Balance

VWA: Hardware Efficient Vectorwise Accelerator for Convolutional Neural Network

ARA: Cross-Layer Approximate Computing Framework Based Reconfigurable Architecture for CNNs

A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm

RNA: A Flexible and Efficient Accelerator Based on Dynamically Reconfigurable Computing for Multiple Convolutional Neural Networks

3D-VNPU: A Flexible Accelerator for 2D/3D CNNs on FPGA.