An Efficient Hardware Architecture for DNN Training by Exploiting Triple Sparsity

Jian Huang,Jinming Lu,Zhongfeng Wang
DOI: https://doi.org/10.1109/iscas48785.2022.9937266
2022-01-01
Abstract:Recently, on-device DNN training has attracted much attention due to its high performance on edge devices and great ability to protect user privacy. Low-power and high throughput implementations of DNN training are highly desired for resource-limited devices. In this paper, we present an efficient hardware accelerator that exploits triple sparsity to reduce the number of unnecessary operations during DNN training. The gradients pruning algorithm is employed to bring error sparsity. Firstly, sparse data are represented in compressed sparse block format, which is suitable for different memory access patterns in all training phases. Secondly, an efficient sparsity detection logic based on the aforementioned data storage format is proposed, which adopts a 2-level grained mechanism. Coarse-grained mask-matching units are reused to improve the energy efficiency, while fine-grained mask-matching units make PEs work independently to enhance throughput. Thirdly, based on the above sparsity detection logic, we propose an efficient architecture for DNN training. Experimental results show that our design can achieve up to 42.1 TOPS and 174.0 TOPS/W in terms of throughput and energy efficiency, respectively. The energy efficiency of our design is $2.12\times $ higher than the state-of-the-art training processor. For training a ResNet-50 model on the CIFAR10 dataset, the energy efficiency of our design achieves 14.10, 96.57, and 84.43 TOPS/W in the FP, BP, and WG phases, respectively.
What problem does this paper attempt to address?