Abstract:Structured pruning is a commonly used convolutional neural network (CNN) compression approach. Pruning rate setting is a fundamental problem in structured pruning. Most existing works introduce too many additional learnable parameters to assign different pruning rates across different layers in CNN or cannot control the compression rate explicitly. Since too narrow network blocks information flow for training, automatic pruning rate setting cannot explore a high pruning rate for a specific layer. To overcome these limitations, we propose a novel framework named Layer Adaptive Progressive Pruning (LAPP), which gradually compresses the network during initial training of a few epochs from scratch. In particular, LAPP designs an effective and efficient pruning strategy that introduces a learnable threshold for each layer and FLOPs constraints for network. Guided by both task loss and FLOPs constraints, the learnable thresholds are dynamically and gradually updated to accommodate changes of importance scores during training. Therefore the pruning strategy can gradually prune the network and automatically determine the appropriate pruning rates for each layer. What's more, in order to maintain the expressive power of the pruned layer, before training starts, we introduce an additional lightweight bypass for each convolutional layer to be pruned, which only adds relatively few additional burdens. Our method demonstrates superior performance gains over previous compression methods on various datasets and backbone architectures. For example, on CIFAR-10, our method compresses ResNet-20 to 40.3% without accuracy drop. 55.6% of FLOPs of ResNet-18 are reduced with 0.21% top-1 accuracy increase and 0.40% top-5 accuracy increase on ImageNet.

Anonymous Model Pruning for Compressing Deep Neural Networks

Class-Aware Pruning for Efficient Neural Networks

Loss Constrains Added Squeeze and Excitation Blocks for Pruning Deep Neural Networks

Pruning by Training: A Novel Deep Neural Network Compression Framework for Image Processing.

Pruning at a Glance: Global Neural Pruning for Model Compression

AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates

A Novel Deep Learning Model Compression Algorithm

Network Automatic Pruning: Start NAP and Take a Nap

OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization

Pruning and quantization for deep neural network acceleration: A survey

AACP: Model Compression by Accurate and Automatic Channel Pruning.

Optimization based Layer-wise Magnitude-based Pruning for DNN Compression

Conditional Automated Channel Pruning for Deep Neural Networks

A Dynamic Pruning Method on Multiple Sparse Structures in Deep Neural Networks

An efficient pruning and fine-tuning method for deep spiking neural network

Pruning On-the-Fly: A Recoverable Pruning Method without Fine-tuning

Non-Parametric Adaptive Network Pruning

Differential Evolution Based Layer-Wise Weight Pruning for Compressing Deep Neural Networks

LAPP: Layer Adaptive Progressive Pruning for Compressing CNNs from Scratch

Fast Hybrid Search for Automatic Model Compression

Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures