A 3.89-Gops/mw Scalable Recurrent Neural Network Processor with Improved Efficiency on Memory and Computation

Jiaquan Wu,Feiteng Li,Zhijian Chen,Xiaoyan Xiang
DOI: https://doi.org/10.1109/tvlsi.2019.2927375
2019-01-01
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Abstract:Recurrent neural networks (RNNs) perform excellently on sequencing tasks but are severely restricted by the complex computations and intensive memory consumption due to their internal fully connected topologies, thereby making it a great challenge to implement RNNs on embedded devices. In this brief, we propose an energy-efficient RNN processor by exploiting the data locality in network compression using an innovative quantified sparse matrix encoding format. Compared with the conventional processors for compressed RNNs, more than 80% of the weight fetching and matrix–vector multiplications can be further reduced in applications, such as natural language and keyword spotting. To handle different scales of RNN models without introducing significant interactive overhead, scalable hardware architecture is presented to organize multiple processor engines in a spatial fashion with the assistance of the network cross-division strategy. Synthesized in the SMIC 40LL CMOS process, the prototype processor has a total area of 0.65 mm2 with 95.5 kB of static random-access memory capacity. Based on the simulation, this processor achieves a peak performance of 24 GOPS and dissipates 6.16-mW power with 1.1 V supply and 200 MHz. The peak energy efficiency reaches 3.89 GOPS/mW, which is state of the art among existing RNN accelerators.
What problem does this paper attempt to address?