Abstract:Structured pruning is a representative model compression technology for convolutional neural networks (CNNs), aiming to prune some less important filters or channels of CNNs. Most recent structured pruning methods have established some criteria to measure the importance of filters, which are mainly based on the magnitude of weights or other parameters in CNNs. However, these judgment criteria lack explainability, and it is insufficient to simply rely on the numerical values of the network parameters to assess the relationship between the channel and the model performance. Moreover, directly utilizing these pruning criteria for global pruning may lead to suboptimal solutions, therefore, it is necessary to complement search algorithms to determine the pruning ratio for each layer. To address these issues, we propose ARPruning (Attention-map-based Ranking Pruning), which reconstructs a new pruning criterion as the importance of the intra-layer channels and further develops a new local neighborhood search algorithm for determining the optimal inter-layer pruning ratio. To measure the relationship between the channel to be pruned and the model performance, we construct an intra-layer channel importance criterion by considering the attention map for each layer. Then, we propose an automatic pruning strategy searching method that can search for the optimal solution effectively and efficiently. By integrating the well-designed pruning criteria and search strategy, our ARPruning can not only maintain a high compression rate but also achieve outstanding accuracy. In our work, it is also experimentally concluded that compared with state-of-the-art pruning methods, our ARPruning method is capable of achieving better compression results. The code can be obtained at https://github.com/dozingLee/ARPruning .

FreePrune: an Automatic Pruning Framework Across Various Granularities Based on Training-Free Evaluation

Class-Aware Pruning for Efficient Neural Networks

Towards Efficient Filter Pruning Via Adaptive Automatic Structure Search

Loss Constrains Added Squeeze and Excitation Blocks for Pruning Deep Neural Networks

A Comprehensive Study of Structural Pruning for Vision Models

Network Automatic Pruning: Start NAP and Take a Nap

MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning

PruneAug: Bridging DNN Pruning and Inference Latency on Diverse Sparse Platforms Using Automatic Layerwise Block Pruning

ARPruning: An automatic channel pruning based on attention map ranking

Rethinking the Value of Network Pruning

Accelerate CNNs from Three Dimensions: A Comprehensive Pruning Framework

Adaptive Search-and-Training for Robust and Efficient Network Pruning

GenExp: Multi-objective pruning for deep neural network based on genetic algorithm

When to Prune? A Policy towards Early Structural Pruning

Network Pruning Spaces

ARLP: Automatic Multi-Agent Transformer Reinforcement Learning Pruner for One-Shot Neural Network Pruning

Not All Data Matters: An End-to-End Adaptive Dataset Pruning Framework for Enhancing Model Performance and Efficiency

Adaptive Activation-based Structured Pruning

Pruning Based Training-Free Neural Architecture Search

Fast Hybrid Search for Automatic Model Compression

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration