Abstract:Recent discoveries on neural network pruning reveal that, with a carefully chosen layerwise sparsity, a simple magnitude-based pruning achieves state-of-the-art tradeoff between sparsity and performance. However, without a clear consensus on "how to choose," the layerwise sparsities are mostly selected algorithm-by-algorithm, often resorting to handcrafted heuristics or an extensive hyperparameter search. To fill this gap, we propose a novel importance score for global pruning, coined layer-adaptive magnitude-based pruning (LAMP) score; the score is a rescaled version of weight magnitude that incorporates the model-level $\ell_2$ distortion incurred by pruning, and does not require any hyperparameter tuning or heavy computation. Under various image classification setups, LAMP consistently outperforms popular existing schemes for layerwise sparsity selection. Furthermore, we observe that LAMP continues to outperform baselines even in weight-rewinding setups, while the connectivity-oriented layerwise sparsity (the strongest baseline overall) performs worse than a simple global magnitude-based pruning in this case. Code: <a class="link-external link-https" href="https://github.com/jaeho-lee/layer-adaptive-sparsity" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to select appropriate layer - wise sparsity for magnitude - based pruning. Specifically, in current neural network pruning, although the simple magnitude - based pruning method can achieve the optimal balance between sparsity and performance when the layer - wise sparsity is appropriately selected, there is currently no clear method to guide this selection. In most cases, the selection of layer - wise sparsity depends on the hand - crafted heuristics of specific algorithms or extensive hyperparameter searches. Therefore, the authors propose a new importance score - the LAMP (Layer - Adaptive Magnitude - based Pruning) score for global pruning, aiming to automatically determine the sparsity of each layer without any hyperparameter adjustment or complex calculations. ### Specific Problem Description 1. **Background and Motivation**: - The goal of neural network pruning is to meet practical constraints, alleviate overfitting, enhance interpretability, or deepen the understanding of neural network training by removing "unimportant weights". - Magnitude - based pruning methods (such as magnitude - based pruning, MP) can achieve very good results when the layer - wise sparsity is appropriately selected. - However, there is a lack of unified criteria for selecting appropriate layer - wise sparsity, which is usually achieved through hand - crafted heuristics or extensive hyperparameter searches. 2. **Shortcomings of Existing Methods**: - Most current methods rely on the hand - crafted heuristics of specific algorithms or extensive hyperparameter searches, which makes the selection of layer - wise sparsity complex and unreliable. - For example, some methods will keep the first convolutional layer completely dense and limit the last fully - connected layer to prune at most 80% of the weights. 3. **Proposed New Method**: - The authors propose a new importance score - the LAMP score for global pruning. - The LAMP score is a rescaled weight magnitude, taking into account the model - level ℓ2 distortion caused by pruning, and does not require any hyperparameter adjustment or complex calculations. - The LAMP score can automatically select the sparsity of each layer, thereby simplifying the pruning process and improving performance. ### Key Points of the Solution - **Definition of the LAMP Score**: \[ \text{score}(u; W) := \frac{(W[u])^2}{\sum_{v \geq u} (W[v])^2} \] where $W[u]$ represents the weight value corresponding to the $u$-th index in the weight tensor $W$. - **Function of the LAMP Score**: - The LAMP score measures the relative importance of the target connection among all surviving connections in its layer. - Pruning the connections with the smallest LAMP scores globally until the global sparsity constraint is met is equivalent to performing magnitude - based pruning with automatically selected layer - wise sparsity. - **Experimental Verification**: - The authors conducted experiments on multiple convolutional neural network architectures (VGG - 16, ResNet - 18/34, DenseNet - 121, EfficientNet - B0) and image datasets (CIFAR - 10/100, SVHN, Restricted ImageNet). - The experimental results show that LAMP outperforms existing layer - wise sparsity selection schemes in various settings. In summary, this paper aims to solve the problem of how to select appropriate layer - wise sparsity for magnitude - based pruning, and provides an automatic, efficient method without hyperparameter adjustment by introducing the LAMP score.

Layer-adaptive sparsity for the Magnitude-based Pruning

Layer-adaptive Structured Pruning Guided by Latency

Class-Aware Pruning for Efficient Neural Networks

Is Complexity Required for Neural Network Pruning? A Case Study on Global Magnitude Pruning

CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information

ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models

Multi-objective Magnitude-Based Pruning for Latency-Aware Deep Neural Network Compression

LSOP: Layer-Scaled One-shot Pruning

Connectivity Matters: Neural Network Pruning Through the Lens of Effective Sparsity

LAP: Latency-aware Automated Pruning with Dynamic-Based Filter Selection

Optimization based Layer-wise Magnitude-based Pruning for DNN Compression

Lookahead: A Far-Sighted Alternative of Magnitude-based Pruning

Pruning Foundation Models for High Accuracy without Retraining

A Simple and Effective Pruning Approach for Large Language Models

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

Reassessing Layer Pruning in LLMs: New Insights and Methods

AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models

Structural Pruning via Latency-Saliency Knapsack

Layer-Adaptive State Pruning for Deep State Space Models

Layer-compensated Pruning for Resource-constrained Convolutional Neural Networks