DSIRBS : A Layer-wise Balanced DNN Weight Pruning Method.

Xifu Qian,Jingfei Jiang,Jinwei Xu,Caili Gao,Qisheng Xu
DOI: https://doi.org/10.1145/3587716.3587740
2023-01-01
Abstract:Compression has recently become an ever-important step before deploying the deep neural networks into resource-limited hardware. Previous unstructured pruning methods could produce high sparsity, but also suffer from data irregularity which becomes the bottleneck of sparsity computation speedup and bandwidth saving. Share Index Row-Balanced Sparsity (SIRBS) is a compression method solving this problem which shares the indexes between rows in one row cluster. When applied SIRBS to some DNN models, we observe the fact that using a fixed row-cluster size among all layers ignores the variable size of layers and different significance they have. Therefore, a three-stage layer-wise unstructured weights pruning framework called Dynamic Share Index Row-Balanced Sparsity (DSIRBS) is presented in this paper based on past successful practices. DSIRBS could search an appropriate row-cluster size for each layer in order to get a better trade-off of model performance and compression ratio. In the pruning training stage, ADMM method is adopted to solve the sparsity constrained optimization problem. And we propose a criteria called pruning magnitude to estimate accuracy of pruned model based on weight matrix similarity and layer’s significance in the search algorithm, which could reduce the search time about 88 ×. We implement DSIRBS on YOLOv5 and Resnet, the results show that DSIRBS outperforms SIRBS-4 in both compression ratio and model performance.
What problem does this paper attempt to address?