An Energy-Efficient Mixed-Bitwidth Systolic Accelerator for NAS-Optimized Deep Neural Networks

Wei Mao,Liuyao Dai,Kai Li,Quan Cheng,Yuhang Wang,Laimin Du,Shaobo Luo,Mingqiang Huang,Hao Yu
DOI: https://doi.org/10.1109/tvlsi.2022.3210069
2022-01-01
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Abstract:Optimized deep neural network (DNN) models and energy-efficient hardware designs are of great importance in edge-computing applications. The neural architecture search (NAS) methods are employed for DNN model optimization with mixed-bitwidth networks. To satisfy the computation requirements, mixed-bitwidth convolution accelerators are highly desired for low-power and high-throughput performance. There exist several methods to support mixed-bitwidth multiply-accumulate (MAC) operations in DNN accelerator designs. The low-bitwidth-combination (LBC) method improves the low-bitwidth throughput with a large hardware cost. The high-bitwidth-split (HBS) method minimizes the additional logic gates for configuration. However, the throughput performance in the low-bitwidth mode is poor. In this work, a bit-split-and-combination (BSC) systolic accelerator is proposed. The BSC-based MAC unit is designed to support mixed-bitwidth operations with the best overall performance. Besides, interprocessing element (PE) systolic and intra-PE paralleled dataflow not only improves throughput performance in mixed-bitwidth modes, but also saves power performance for data transmission. The proposed work is designed and synthesized in a 28-nm process. The BSC MAC unit achieves a maximum $2.08\times $ and $1.75\times $ energy efficiency improvement than the HBS and LBC unit, respectively. Compared with the state-of-the-art accelerators, the proposed work also achieves excellent energy-efficient performance with 20.02, 23.55, and 30.17 TOPS/W on mixed-bitwidth VGG-16, ResNet-18, and LeNet-5 benchmarks at 0.6 V, respectively.
What problem does this paper attempt to address?