SMVAR: A Novel RNN Accelerator Based on Non-blocking Data Distribution Structure

Jinwei Xu,Jingfei Jiang,Shiyao Xu,Lei Gao
DOI: https://doi.org/10.1109/hpcc-dss-smartcity-dependsys60770.2023.00107
2023-01-01
Abstract:Recurrent neural networks (RNNs) have become common models in the field of artificial intelligence to process temporal sequence task, such as speech recognition, text analysis, natural language processing, etc. To speedup RNNs inference, previous research proposed model sparse pruning techniques. However, the pruning rate of existing sparse pruning algorithms will be affected by pruning granularity and hardware friendliness. In order to approximate nonstructured pruning algorithm, this paper proposes Large Region Balanced Sparse (LRBS) pruning method, which does not limit sub-matrix shape and effectively improves pruning rate. Furthermore, we propose Sparse Matrix Vector Multiplication Accelerator for RNNs (SMVAR), which adopt non-blocking data distribution structure to solve the problem of efficient execution of large region irreg-ular matrix multiplication. To further improve the accelerator performance, SMVAR fine-grained adjusts the pipeline between macro-operations to reduce the idle of compute components. In addition, according to the coarse-grained block characteristics of LRBS algorithm, we develop the coarse-grained parallelism of accelerator with multiply compute units(CUs) structure. Experiments show that the pruning rate of our proposed LRBS is 1.25x-2.5x higher than that of the existing pruning algorithms. Compared with the existing work, the execution efficiency is improved by more than 2.02x-35.9x in the same application scenario.
What problem does this paper attempt to address?