Exploring the Programmability for Deep Learning Processors: from Architecture to Tensorization

Chixiao Chen,Huwan Peng,Xindi Liu,Hongwei Ding,C. -J. Richard Shi
DOI: https://doi.org/10.1109/dac.2018.8465569
2018-01-01
Abstract:This paper presents an instruction and Fabric Programmable Neuron Array (iFPNA) architecture, its 28nm CMOS chip prototype, and a compiler for the acceleration of a variety of deep learning neural networks (DNNs) including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and fully connected (FC) networks on chip. The iFPNA architecture combines instruction-level programmability as in an Instruction Set Architecture (ISA) with logic-level reconfigurability as in a Field-Programmable Gate Array (FPGA) in a sliced structure for scalability. Four data flow models, namely weight stationary, input stationary, row stationary and tunnel stationary, are described as the abstraction of various DNN data and computational dependence. The iFPNA compiler partitions a large-size DNN to smaller networks, each being mapped to, optimized and code generated for, the underlying iFPNA processor using one or a mixture of the four data-flow models. Experimental results have shown that state-of-art large-size CNNs, RNNs, and FC networks can be mapped to the iFPNA processor achieving the near ASIC performance.
What problem does this paper attempt to address?