Joint Sparsity with Mixed Granularity for Efficient GPU Implementation

Chuliang Guo,Xingang Yan,Yufei Chen,He Li,Xunzhao Yin,Cheng Zhuo
DOI: https://doi.org/10.23919/DATE51398.2021.9473939
2021-01-01
Abstract:Given the over-parameterization property in recent deep neural networks, sparsification is widely used to compress networks and save memory footprint. Unstructured sparsity, i.e., fine-grained pruning, can help preserve model accuracy, while structured sparsity, i.e., coarse-grained pruning, is preferred for general-purpose hardwares, e.g., GPUs. This paper proposes a novel joint sparsity pattern using mixed granularity to take advantage of both unstructured and structured sparsity. We utilize a heuristic strategy to infer the joint sparsity pattern by mixing vector-wise fine-grained and block-wise coarse-grained pruning masks. Experimental results show that the joint sparsity can achieve higher model accuracy and sparsity ratio while consistently maintaining moderate inference speed for VGG-16 on CIFAR-100 in comparison to the commonly used block sparsity and balanced sparsity strategies.
What problem does this paper attempt to address?