Systolic-Array Deep-Learning Acceleration Exploring Pattern-Indexed Coordinate-Assisted Sparsity for Real-Time On-Device Speech Processing

Shiwei Liu,Zihao Zhao,Yanhong Wang,Qiaosha Zou,Yiyun Zhang,C- J. Richard Shi
DOI: https://doi.org/10.1145/3453688.3461530
2021-06-22
Abstract:This paper presents a hardware-software co-design for efficient sparse deep neural networks (DNNs) implementation in a regular systolic array for real-time on-device speech processing. The weight pruning format, exploring pattern-based coordinate-assisted (PICA) sparsity, expands the pattern-based pruning into both convolutional neural networks (CNNs) and recurrent neural networks (RNNs). It reduces the index storage overhead as well as avoids accuracy degradation. The proposed systolic accelerator leverages the intrinsic data reuse and locality to accommodate the PICA-based sparsity without using complex data distribution networks. It also supports DNNs with different topologies. By reducing the model size by 16x, PICA sparsification reduces 6.02x index storage overhead while still achieving 20.7% WER in TIMIT dataset. For the pruned WaveNet and LSTM, the accelerator achieves 0.62 and 2.69 TOPS/W energy efficiency, 1.7x to 10x higher than the state-of-the-art.
What problem does this paper attempt to address?