A 4.69-TOPS/W Training, 2.34- $\mu$ J/Image Inference On-Chip Training Accelerator with Inference-Compatible Backpropagation and Design Space Exploration in 28-Nm CMOS

Junyi Qian,Haitao Ge,Yicheng Lu,Weiwei Shan
DOI: https://doi.org/10.1109/jssc.2024.3451332
IF: 5.4
2024-01-01
IEEE Journal of Solid-State Circuits
Abstract:On-chip training (OCT) accelerators improve personalized recognition accuracy while ensuring user privacy. However, previous OCT accelerators often required significant additional hardware costs to support retraining, even though inference is the primary use case. We propose an inference-pattern-compatible backpropagation (BP) circuit, which enables the training process to reuse inference hardware. To achieve high energy efficiency, we utilize three hardware-friendly optimization methods that significantly reduce redundant computation and external memory access (EMA). Additionally, we propose a design space exploration (DSE) to search for the optimal hardware configurations, which improves system performance while reducing the design time cost. Fabricated in a 28-nm CMOS process, this single-core OCT chip is able to train all the layers of a neural network (NN), achieving a peak training efficiency of 4.69 Tera operations per second per watt (TOPS/W). It also achieves the lowest inference energy of 2.34 mu J/inf/image under a core voltage of 0.48 V and 40 MHz.
What problem does this paper attempt to address?