THETA: A High-Efficiency Training Accelerator for DNNs with Triple-Side Sparsity Exploration

Jinming Lu,Jian Huang,Zhongfeng Wang
DOI: https://doi.org/10.1109/tvlsi.2022.3175582
2022-01-01
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Abstract:Training deep neural networks (DNNs) on edge devices has attracted increasing attention in real-world applications for domain adaption and privacy protection. However, deploying DNN training on resource-limited edge devices is challenging as there are massive computations and data transportation in training. To address this issue, we propose an energy-efficient training accelerator in this work by employing a hybrid compression strategy. Here, various data redundancies are fully exploited, and the real triple-side sparsity is achieved. Hence, the computational complexity is drastically reduced with negligible accuracy loss across a range of transfer learning tasks. To facilitate triple-side zero-skipping operations during different training stages, we first present a novel sparse data representation and a triple-sparsity index matching scheme. Second, a sparse tensor processing unit (STPU) arranged in a hierarchical structure is developed, which enables a flexible dataflow to process convolutional (Conv) and fully connected (FC) layers with diverse computational patterns throughout the entire training. Third, an auxiliary processing unit (APU) is designed to execute some postprocessing operations, such as rectified linear unit (ReLU) and on-the-fly pruning. Finally, the training accelerator is implemented under Taiwan Semiconductor Manufacturing Company (TSMC) 28-nm process and evaluated on multiple benchmarks. The experimental results show that THETA achieves 7.28–22.32 tera operations per second (TOPS) and 45.24–133.70 TOPS/W in performance and energy efficiency, reducing 40– $72\times $ training time and 19– $63\times $ energy consumption over dense training, respectively. Compared with the prior art, our design offers $1.6\times $ throughput and $1.9\times $ energy efficiency, respectively.
What problem does this paper attempt to address?