Timing Error Tolerant CNN Accelerator with Layer-Wise Approximate Multiplication

Bo Liu,Na Xie,Qingwen Wei,Guang Yang,Chonghang Xie,Weiqiang Liu,Hao Cai
DOI: https://doi.org/10.1109/tcad.2024.3395984
IF: 2.9
2024-01-01
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract:Exploiting the error tolerance in computation, approximate circuits become an emerging computing paradigm to increase the energy efficiency in digital systems, which is crucial in high-performance and low-power systems for the edge Internet-of-Things (EIoT) devices. Inspired by the state-of-the-art high-efficiency NN accelerators, three techniques are proposed for effectively integrating the approximate computing unit into CNN accelerator to achieve a dynamic energy-accuracy trade-off: (1) An approximate multiplier that can be configured to three precision modes is proposed. A weight pre-encoding method is used to save hardware overhead. (2) For hybrid-accuracy layer-wise mapping, the hessian-aware layer-wise accuracy scaling is proposed, which concerns inference accuracy and hardware overhead simultaneously. A progressive re-training approach is proposed to enable an aggressive approximation configuration and higher power reduction. (3) A tensor multiplication unit (TMU) with timing error detection and correction (TEDC) approach is proposed, enabling an aggressive voltage scaling and a 41.5% power reduction is obtained. An energy-efficient CNN accelerator is proposed and shows how deep learning can be brought to EIoT devices by running each layer at its appropriate computational accuracy. Implemented under 28-nm CMOS technology, the CNN accelerator achieves the energy efficiency of 14.4 TOPS/W. The proposed accelerator and method are conducted on the applications of keyword spotting of GSCD, CIFAR10 and CIFAR100, 44.5%~46.7% multiplication energy is saved while reducing the accuracy by less than 0.6%.
What problem does this paper attempt to address?