An Energy-Efficient Architecture for Accelerating Inference of Memory-Augmented Neural Networks

Jianxun Yang,Leibo Liu,Jin Zhang,Shaojun Wei,Shouyi Yin
DOI: https://doi.org/10.1109/nanoarch47378.2019.181289
2019-01-01
Abstract:Although recurrent neural networks (RNNs) have shown excellent performance in sequence-related applications such as speech recognition and image caption, they are forceless in the domain of cognitive reasoning like question answering and algorithm learning due to the limited memory capacity. To address this issue, memory-augmented neural networks (MANNs) have been proposed to achieve excellent reasoning ability in cognitive applications by coupling neural networks (mostly RNNs) to an external memory which can be written and read. However, MANNs require numerous operations and memory accesses to interact with external memory, which hinders the deployment of MANNs in low-power devices. In this work, we propose an algorithm-hardware cooperated full-stack approach to accelerate the inference of MANNs. Firstly, we propose an operator scheduling mechanism to optimize the calculation process of MANNs for high computation parallelism and inference efficiency. Secondly, we propose a tri-mode softmax computing scheme to reduce calculation overheads for MANNs with different accuracy and latency requirements. Finally, a reconfigurable architecture is designed to efficiently implement each operator in MANNs for high inference speed and energy efficiency. Tested on bAbI dataset, the proposed optimizations and architecture achieves 1.28× improvement of energy efficiency for MANNs compared with GPU implementation.
What problem does this paper attempt to address?