ARA: Cross-Layer Approximate Computing Framework Based Reconfigurable Architecture for CNNs

Yu Gong,Bo Liu,Wei Ge,Longxing Shi
DOI: https://doi.org/10.1016/j.mejo.2019.03.011
IF: 1.992
2019-01-01
Microelectronics Journal
Abstract:Convolution Neural Networks are now widely used in image processing, object detection, video detection, and other classification tasks. Thus the acceleration of CNN is also widely researched for its complex computation features and data dependence. To achieve high energy efficiency, we proposed a CNN accelerator with approximate computing techniques. In this paper, two main aspects are studied: the hardware-compatible network compression algorithms, and the approximate computing units and architectures with hardware resource scheduling strategies. For the algorithm approximation part, we introduce a dynamic layered CNN structure for different scales of input, the convolution kernel shrinking strategy with layer-by-layer quantization to compress networks, and the Winograd Minimum Filter algorithm to decrease operations in convolution layers. For the architecture part, two types of approximate multipliers are innovated as iterative multipliers, and multi-port SRAM integrated LUT based multipliers. Approximate adders with error correction logic are also designed. Based on the approximate computing units, the Convolution Neural Processing Unit named CNPU is proposed with reconfigurable datapath designs for the mapping of different tasks. By the work on the algorithm, the CNPU architecture and the datapath design, we propose a high energy efficient reconfigurable CNN accelerator with approximate computing named ARA (Approximate computing based Reconfigurable Architecture). Implemented under TSMC 45 nm process, our accelerator achieves 1.92TOPS/W@ 1.1 V, 200 MHz and 3.72TOPS/W@ 0.9 V, 40 MHz in energy-efficiency, which is 1.51 ∼ 4.36 times better than the state-of-the-art accelerators.
What problem does this paper attempt to address?