Splittable pattern-specific weight pruning for deep neural networks

Yiding Liu,Yinglei Teng,Tao Niu
DOI: https://doi.org/10.1109/ICME55011.2023.00249
2023-01-01
Abstract:Network pruning is one of the most dominant model compression methods today, which can be broadly divided into filter pruning and weight pruning. Unlike filter pruning that deletes the whole filters thus prone to cause unrecoverable accuracy loss, weight pruning removes the single weights at a fine-grained level, which effectively avoids this problem. However, weight pruning leads to unstructured sparsity and is thus incompatible with general platforms. To address such limitation, we propose Splittable Pattern-Specific Weight Pruning(SPWP) to achieve both compression and compatibility, consisting of Patterned Weight Searching(PWS) and Kernel Equivalent Splitting(KES). Specifically, we study the intrinsic features of convolution kernels and devise PWS to prune weights in regular shapes based on such features. During inference, KES equivalently splits the pruned sparse kernel into parallel branches according to the linear additivity of convolution, allowing the network to be accelerated on general platforms. Extensive experiments of different models on various datasets demonstrate the superior performance of our method. For example, SPWP can prune 60.1% total FLOPS of ResNet-56 on CIFAR-10 with even a 0.08% of top-1 accuracy increase.
What problem does this paper attempt to address?