Ifpna: A Flexible and Efficient Deep Learning Processor in 28-Nm CMOS Using a Domain-Specific Instruction Set and Reconfigurable Fabric.

Chixiao Chen,Xindi Liu,Huwan Peng,Hongwei Ding,C-j Richard Shi
DOI: https://doi.org/10.1109/jetcas.2019.2914355
IF: 5.877
2019-01-01
IEEE Journal on Emerging and Selected Topics in Circuits and Systems
Abstract:This paper presents iFPNA, instruction-and-fabric programmable neuron array: a deep learning processor using a neural network specific instruction set architecture and reconfigurable fabric. The design is motivated by the trade-off between the efficiency and flexibility of deep learning processor designs. The proposed architecture contains a controller for programming, a global feature buffer for data arrangement, and 16 reconfigurable neuron slices for computing. The controller uses dedicated instructions to control the feature buffer and neuron slices. The neuron slices support multiplication-and-accumulation, non-linear activation, element-wise operation, and the pooling of different bit-width and kernel size. Different deep neural networks are mapped to iFPNA such as conventional CNNs, depthwise/pointwise CNNs, LSTMs, and GRUs. The paper also discusses various data flow mapping schemes. An iFPNA prototype is designed and fabricated on 28-nm HPC CMOS technology. Measurement results show that the iFPNA achieves a peak energy efficiency of 1.72 TOPS/W running at 30-MHz clock rate with 0.63 V voltage supply. The measured latency on AlexNet is 57.2 ms and on LSTM-512 is 40 ms at 125 MHz clock rate.
What problem does this paper attempt to address?