Abstract:Although convolutional neural network (CNN) models have greatly enhanced the development of many fields, the untenable number of parameters and computations in these models yield significant performance and energy challenges in hardware implementations. Transferred filter-based methods, as very promising techniques that have not yet been explored in the architecture domain, can substantially compress CNN models. However, their straightforward hardware implementation inherently incurs massive redundant computations, causing significant energy and time consumption. In this work, a highly efficient transferred filter-based engine (TFE) is developed to alleviate this deficiency, with CNN models compressed and accelerated. First, the filters of CNN models are flexibly transferred according to specific tasks to reduce the model size. Then, two hardware-friendly mechanisms are proposed in the TFE to remove duplicate computations caused by transferred filters, which can further accelerate transferred CNN models. The first mechanism exploits the shared weights hidden in each row of transferred filters and reuses the corresponding same partial sums, reducing at least 25% of repetitive computations in each row. The second mechanism can intelligently schedule and access the memory system to reuse the repetitive partial sums among different rows of the transferred filters with at least 25% of computations eliminated. Furthermore, an efficient hardware architecture is proposed in the TFE to fully reap the benefits of the two proposed mechanisms such that different types of networks are flexibly supported. To achieve high energy efficiency, the sub-array-based filter mapping method (SAFM) is proposed, where the process element (PE) subarray is used as the elementary computational unit to support various filters. Therein, input data can be efficiently broadcast in each PE sub-array and the load can be stripped from each PE and intensively alleviated, which can dramatically reduce the area and power consumption. Excluding MobileNet-like networks that adopt depth-wise convolution, most mainstream networks can be compressed and accelerated by the proposed TFE. Two state-of-the-art transferred filter-based methods, i.e., doubly CNN and symmetry CNN are implemented by exploiting the TFE. Compared with Eyeriss, average speedup improvements of 2.93× and 3.17× are achieved in the convolutional layers of various modern CNNs. The overall energy efficiency can be improved by 12.66× and 13.31× on average. Compared with other state-of-the-art related works, the TFE can maximally achieve a parameter reduction of 4.0×, a speedup of 2.72× and an energy efficiency improvement of 10.74× on VGGNet.

An Efficient Dataflow Mapping Method for Convolutional Neural Networks

EWS: an Energy-Efficient CNN Accelerator with Enhanced Weight Stationary Dataflow

Optical Convolution Based Computational Method for Low-Power Image Processing

A Reconfigurable Spatial Architecture for Energy-Efficient Inception Neural Networks

Dataflow Aware Mapping of Convolutional Neural Networks Onto Many-Core Platforms With Network-on-Chip Interconnect

A 3D Hybrid Optical-Electrical NoC Using Novel Mapping Strategy Based DCNN Dataflow Acceleration

Optimizing Convolutional Neural Networks on Multi-Core Vector Accelerator

Stacked Filters Stationary Flow For Hardware-Oriented Acceleration Of Deep Convolutional Neural Networks

COSY: an Energy-Efficient Hardware Architecture for Deep Convolutional Neural Networks Based on Systolic Array.

A Novel CONV Acceleration Strategy Based on Logical PE Set Segmentation for Row Stationary Dataflow

Relative Indexed Compressed Sparse Filter Encoding Format for Hardware-Oriented Acceleration of Deep Convolutional Neural Networks

Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks

Data-centric Computation Mode for Convolution in Deep Neural Networks.

Optimizing CNN Hardware Acceleration with Configurable Vector Units and Feature Layout Strategies

TFE: Energy-efficient Transferred Filter-based Engine to Compress and Accelerate Convolutional Neural Networks

EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators

A High Efficient Architecture for Convolution Neural Network Accelerator

A Scalable 3D Array Architecture for Accelerating Convolutional Neural Networks

Local Channel Transformation for Efficient Convolutional Neural Network.

SemiMap: A Semi-Folded Convolution Mapping for Speed-Overhead Balance on Crossbars.

Espace: Accelerating Convolutional Neural Networks Via Eliminating Spatial and Channel Redundancy