COSY: an Energy-Efficient Hardware Architecture for Deep Convolutional Neural Networks Based on Systolic Array.

Chen Yin,Qiang Chen,Miren Tian,Mohan Ji,Chenglong Zou,Yin'an Wang,Bo Wang
DOI: https://doi.org/10.1109/icpads.2017.00034
2017-01-01
Abstract:Deep convolutional neural networks (CNNs) show extraordinary abilities in artificial intelligence applications, but their large scale of computation usually limits their uses on resource-constrained devices. For CNN's acceleration, exploiting the data reuse of CNNs is an effective way to reduce bandwidth and energy consumption. Row-stationary (RS) dataflow of Eyeriss is one of the most energy-efficient state-of-the-art hardware architectures, but has redundant storage usage and data access, so the data reuse has not been fully exploited. It also requires complex control and is intrinsically unable to skip over zero-valued inputs in timing. In this paper, we present COSY (CNN on Systolic Array), an energy-efficient hardware architecture based on the systolic array for CNNs. COSY adopts the method of systolic array to achieve the storage sharing between processing elements (PEs) in RS dataflow at the RF level, which reduces low-level energy consumption and on-chip storage. Multiple COSY arrays sharing the same storage can execute multiple 2-D convolutions in parallel, further increasing the data reuse in the low-level storage and improving throughput. To compare the energy consumption of COSY and Eyeriss running actual CNN models, we build a process-based energy consumption evaluation system according to the hardware storage hierarchy. The result shows that COSY can achieve an over 15% reduction in energy consumption under the same constraints, improving the theoretical Energy-Delay Product (EDP) and Energy-Delay Squared Product (ED 2 P) by 1.33× on average. In addition, we prove that COSY has the intrinsic ability for zero-skipping, which can further increase the improvements to 2.25× and 3.83× respectively.
What problem does this paper attempt to address?