Layer-adaptive sparsity for the Magnitude-based Pruning

Jaeho Lee,Sejun Park,Sangwoo Mo,Sungsoo Ahn,Jinwoo Shin
DOI: https://doi.org/10.48550/arXiv.2010.07611
2021-05-09
Abstract:Recent discoveries on neural network pruning reveal that, with a carefully chosen layerwise sparsity, a simple magnitude-based pruning achieves state-of-the-art tradeoff between sparsity and performance. However, without a clear consensus on "how to choose," the layerwise sparsities are mostly selected algorithm-by-algorithm, often resorting to handcrafted heuristics or an extensive hyperparameter search. To fill this gap, we propose a novel importance score for global pruning, coined layer-adaptive magnitude-based pruning (LAMP) score; the score is a rescaled version of weight magnitude that incorporates the model-level $\ell_2$ distortion incurred by pruning, and does not require any hyperparameter tuning or heavy computation. Under various image classification setups, LAMP consistently outperforms popular existing schemes for layerwise sparsity selection. Furthermore, we observe that LAMP continues to outperform baselines even in weight-rewinding setups, while the connectivity-oriented layerwise sparsity (the strongest baseline overall) performs worse than a simple global magnitude-based pruning in this case. Code: <a class="link-external link-https" href="https://github.com/jaeho-lee/layer-adaptive-sparsity" rel="external noopener nofollow">this https URL</a>
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to select appropriate layer - wise sparsity for magnitude - based pruning. Specifically, in current neural network pruning, although the simple magnitude - based pruning method can achieve the optimal balance between sparsity and performance when the layer - wise sparsity is appropriately selected, there is currently no clear method to guide this selection. In most cases, the selection of layer - wise sparsity depends on the hand - crafted heuristics of specific algorithms or extensive hyperparameter searches. Therefore, the authors propose a new importance score - the LAMP (Layer - Adaptive Magnitude - based Pruning) score for global pruning, aiming to automatically determine the sparsity of each layer without any hyperparameter adjustment or complex calculations. ### Specific Problem Description 1. **Background and Motivation**: - The goal of neural network pruning is to meet practical constraints, alleviate overfitting, enhance interpretability, or deepen the understanding of neural network training by removing "unimportant weights". - Magnitude - based pruning methods (such as magnitude - based pruning, MP) can achieve very good results when the layer - wise sparsity is appropriately selected. - However, there is a lack of unified criteria for selecting appropriate layer - wise sparsity, which is usually achieved through hand - crafted heuristics or extensive hyperparameter searches. 2. **Shortcomings of Existing Methods**: - Most current methods rely on the hand - crafted heuristics of specific algorithms or extensive hyperparameter searches, which makes the selection of layer - wise sparsity complex and unreliable. - For example, some methods will keep the first convolutional layer completely dense and limit the last fully - connected layer to prune at most 80% of the weights. 3. **Proposed New Method**: - The authors propose a new importance score - the LAMP score for global pruning. - The LAMP score is a rescaled weight magnitude, taking into account the model - level ℓ2 distortion caused by pruning, and does not require any hyperparameter adjustment or complex calculations. - The LAMP score can automatically select the sparsity of each layer, thereby simplifying the pruning process and improving performance. ### Key Points of the Solution - **Definition of the LAMP Score**: \[ \text{score}(u; W) := \frac{(W[u])^2}{\sum_{v \geq u} (W[v])^2} \] where \(W[u]\) represents the weight value corresponding to the \(u\)-th index in the weight tensor \(W\). - **Function of the LAMP Score**: - The LAMP score measures the relative importance of the target connection among all surviving connections in its layer. - Pruning the connections with the smallest LAMP scores globally until the global sparsity constraint is met is equivalent to performing magnitude - based pruning with automatically selected layer - wise sparsity. - **Experimental Verification**: - The authors conducted experiments on multiple convolutional neural network architectures (VGG - 16, ResNet - 18/34, DenseNet - 121, EfficientNet - B0) and image datasets (CIFAR - 10/100, SVHN, Restricted ImageNet). - The experimental results show that LAMP outperforms existing layer - wise sparsity selection schemes in various settings. In summary, this paper aims to solve the problem of how to select appropriate layer - wise sparsity for magnitude - based pruning, and provides an automatic, efficient method without hyperparameter adjustment by introducing the LAMP score.