9.2 A 28nm 12.1TOPS/W Dual-Mode CNN Processor Using Effective-Weight-Based Convolution and Error-Compensation-Based Prediction

Huiyu Mo,Wenping Zhu,Wenjing Hu,Guangbin Wang,Qiang Li,Ang Li,Shouyi Yin,Shaojun Wei,Leibo Liu
DOI: https://doi.org/10.1109/ISSCC42613.2021.9365943
2021-01-01
Abstract:To deploy convolutional neural networks (CNNs) on edge devices efficiently, most existing CNN processors were built on quantized CNNs to optimize the inference operations. However, three issues (Fig. 9.2.1) have not been well addressed: 1) Duplicate weights in each kernel after quantization yielding repetitive multiplications; 2) a huge number of unnecessary MACs caused by ReLU activation functions; 3) frequent off-chip memory access in residual blocks.
What problem does this paper attempt to address?