DUET: Boosting Deep Neural Network Efficiency on Dual-Module Architecture

Liu Liu,Zheng Qu,Lei Deng,Fengbin Tu,Shuangchen Li,Xing Hu,Zhenyu Gu,Yufei Ding,Yuan Xie
DOI: https://doi.org/10.1109/micro50266.2020.00066
2020-10-01
Abstract:Deep Neural Networks (DNNs) have been driving the mainstream of Machine Learning applications. However, deploying DNNs on modern hardware with stringent latency requirements and energy constraints is challenging because of the compute-intensive and memory-intensive execution patterns of various DNN models. We propose an algorithm-architecture co-design to boost DNN execution efficiency. Leveraging the noise resilience of nonlinear activation functions in DNNs, we propose dual-module processing that uses approximate modules learned from original DNN layers to compute insensitive activations. Therefore, we can save expensive computations and data accesses of unnecessary sensitive activations. We then design an Executor-Speculator dual-module architecture with support for balance execution and memory access reduction. With acceptable model inference quality degradation, our accelerator design can achieve 2.24x speedup and 1.97x energy efficiency improvement for compute-bound Convolutional Neural Networks (CNNs) and memory-bound Recurrent Neural Networks (RNNs).
What problem does this paper attempt to address?