7.5 A 65nm 0.39-to-140.3 TOPS/W 1-to-12b Unified Neural Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1× Higher TOPS/mm 2 and 6T HBST …

Jinshan Yue,Ruoyang Liu,Wenyu Sun,Zhe Yuan,Zhibo Wang,Yung-Ning Tu,Yi-Ju Chen,Ao Ren,Yanzhi Wang,Meng-Fan Chang,Xueqing Li,Huazhong Yang,Yongpan Liu
2019-01-01
Abstract:Energy-efficient neural-network (NN) processors have been proposed for battery-powered deep-learning applications, where convolutional (CNN), fully-connected (FC) and recurrent NNs (RNN) are three major workloads. To support all of them, previous solutions [1-3] use either area-inefficient heterogeneous architectures, including CNN and RNN cores, or an energy-inefficient reconfigurable architecture. A block-circulant algorithm [4] can unify CNN/FC/RNN workloads with transpose-domain acceleration, as shown in Fig. 7.5.1. Once NN weights are trained using the block-circulant pattern, all workloads are transformed into consistent matrix-vector multiplications (MVM), which can potentially achieve 8 to-128× storage savings and a O(n 2 )-to-O(nlog(n)) computation complexity reduction.
What problem does this paper attempt to address?