ACBN: Approximate Calculated Batch Normalization for Efficient DNN On-Device Training Processor

Baoting Li,Hang Wang,Fujie Luo,Xuchong Zhang,Hongbin Sun,Nanning Zheng
DOI: https://doi.org/10.1109/tvlsi.2023.3262787
2023-01-01
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Abstract:Batch normalization (BN) has been established as a very effective component in deep learning, largely helping accelerate the convergence of deep neural network (DNN) training. Nevertheless, its hardware architecture has not received much attention in the field of DNN on-device training processors. Several previous designs incur either high off-chip memory traffic or high circuit complexity, and hence have deficiencies in terms of hardware efficiency and performance. This article proposes approximately calculated BN (ACBN) to achieve a much better tradeoff between hardware efficiency and performance for DNN on-device training processors. The accuracy and convergence rate of the proposed ACBN have been extensively evaluated using four typical DNN models. Compared with the state-of-the-art reference design, the hardware simulation results show the proposed ACBN can at least reduce floating point operations by 22.2% and save external memory access by 33.3% on average. Moreover, the proposed ACBN introduces 63.6% data sparsity for the backward propagation of BN layers of VGG16 on average. To the best of our knowledge, we are the first to introduce data sparsity for the backward propagation of BN layers. The ACBN module is implemented on Zynq UltraScale $+$ ZCU102 system-on-chip (SoC) field-programmable gate array (FPGA), and the results show that the implementation of ACBN hardware module saves 33.9% look-up table (LUT), 49.4% flip-flop (FF), 75% digital signal processor (DSP), and reduces the power by 12.4% compared with the reference design while achieving better performance.
What problem does this paper attempt to address?