BitXpro: Regularity-Aware Hardware Runtime Pruning for Deep Neural Networks

Hongyan Li,Hang Lu,Haoxuan Wang,Shengji Deng,Xiaowei Li
DOI: https://doi.org/10.1109/tvlsi.2022.3221732
2023-01-01
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Abstract:Classic deep neural network (DNN) pruning mostly leverages software-based methodologies to tackle the accuracy/speed tradeoff, which involves complicated procedures such as critical parameter searching, fine-tuning, and sparse training to find the best plan. In this article, we explore the opportunities of hardware runtime pruning and propose a regularity-aware hardware runtime pruning methodology, termed “BitXpro” to empower versatile DNN inference. The method targets the bit-level sparsity and the sparsity irregularity in the parameters and pinpoints and prunes the useless bits on-the-fly in the proposed BitXpro accelerator. The versatility of BitXpro lies in: 1) software effortless; 2) orthogonal to the software-based pruning; and 3) multiprecision support (including both floating point and fixed point). Empirical studies on various domain-specific artificial intelligence (AI) tasks highlight the following results: 1) up to $8.27\times $ speedup over the original nonpruned DNN and $10.81\times $ speedup collaborated with the software-pruned DNN; 2) up to 0.3% and 0.04% higher accuracy for the floating- and fixed-point DNNs, respectively; and 3) $6.01\times $ and $8.20\times $ performance improvement over the state-of-the-art accelerators, with 0.068 mm2 and 74.82 mW (floating point 32) and 40.44 mW (16-bit fixed point) power consumption under the TSMC 28-nm technology library.
What problem does this paper attempt to address?