Winograd-CNN Accelerator Compatible with Sparse and Non-sparse Models.

Huazhen Li,Weiting Chen,Jiangtao Wang
DOI: https://doi.org/10.1109/hpcc-dss-smartcity-dependsys53884.2021.00105
2021-01-01
Abstract:The inference speed of convolutional neural networks is a key factor for their application in the industry. The two research directions of non-sparse model acceleration and sparse model acceleration are now studied independently. In this paper, we propose a hybrid granular pruning method and a fixed-length binary pruning index scheme, based on which a compatible acceleration architecture is further designed. The hybrid granular pruning method is realized by executing fine-grained pruning after coarse-grained pruning. The binary pruning index scheme ensures a simple fixed-length binary mask to index valid weight positions no matter the model is pruned or not, thus makes it possible for an accelerator compatible with both non-sparse models without pruning and sparse models with hybrid granular pruning. An acceleration architecture is further designed, using a new data layout, a buffer-pipeline mechanism for the data buffering, and converting the Winograd algorithm into matrix operations with parallelism carried out from three aspects. The experimental results suggest that, our proposed acceleration solution shows not only better acceleration performance but also excellent compatibility, and it can be directly applied for the acceleration of models at different pruning levels without recompilation.
What problem does this paper attempt to address?