Abstract:Currently, salience-based channel pruning makes continuous breakthroughs in network compression. In the realization, the salience mechanism is used as a metric of channel salience to guide pruning. Therefore, salience-based channel pruning can dynamically adjust the channel width at run-time, which provides a flexible pruning scheme. However, there are two problems emerging: a gating function is often needed to truncate the specific salience entries to zero, which destabilizes the forward propagation; dynamic architecture brings more cost for indexing in inference which bottlenecks the inference speed. In this paper, we propose a Progressive Channel-Shrinking (PCS) method to compress the selected salience entries at run-time instead of roughly approximating them to zero. We also propose a Running Shrinking Policy to provide a testing-static pruning scheme that can reduce the memory access cost for filter indexing. We evaluate our method on ImageNet and CIFAR10 datasets over two prevalent networks: ResNet and VGG, and demonstrate that our PCS outperforms all baselines and achieves state-of-the-art in terms of compression-performance tradeoff. Moreover, we observe a significant and practical acceleration of inference.

What problem does this paper attempt to address?

The main problems that this paper attempts to solve are two major issues existing in the current saliency - based channel pruning methods when dynamically adjusting the channel width at runtime: 1. **Sudden Pruning Operation**: Since the saliency vector rarely contains zero values, in order to disable the channel output (i.e., channel pruning), it is usually necessary to use a gate function or a step function to quantize or truncate some saliency entries to zero. This sudden pruning will harm the network performance because different inputs will cause the pruning scheme to keep changing during the training process, resulting in training instability. 2. **Inefficient Inference**: Since the pruning scheme is not fixed among different input samples, a large number of channel indexing operations are required for each inference, which increases the memory access cost (MAC) and thus reduces the inference speed. To solve these problems, the paper proposes a new Progressive Channel - Shrinking (PCS) method, which mainly includes the following two aspects: 1. **Progressive Shrinking Strategy**: Use a continuous and differentiable saliency generator to avoid the back - propagation problem, and then gradually shrink the saliency entries related to relatively low - saliency channels instead of directly truncating them to zero. This can avoid the training instability problem caused by sudden pruning. 2. **Runtime Shrinking Strategy**: Propose a runtime shrinking strategy to avoid a large number of weight indexing during inference and significantly improve the inference speed. The runtime shrinking strategy ensures that the pruning schemes for all samples are the same, and the disabled channels can be directly removed after training, so that no indexing operations are required during inference. Through these methods, the paper aims to achieve a more stable training process and higher inference efficiency, while achieving a better balance between compression performance and model performance. Experimental results show that the PCS method outperforms the existing channel pruning methods on popular network models such as ResNet and VGG on the ImageNet and CIFAR10 datasets.

Progressive Channel-Shrinking Network

Progressive Channel-Shrinking Network

Loss Constrains Added Squeeze and Excitation Blocks for Pruning Deep Neural Networks

Pruning by Training: A Novel Deep Neural Network Compression Framework for Image Processing.

Structured Pruning for Efficient Convolutional Neural Networks Via Incremental Regularization

A Pruning Method Based on the Dissimilarity of Angle among Channels and Filters

PSE-Net: Channel Pruning for Convolutional Neural Networks with Parallel-subnets Estimator

AACP: Model Compression by Accurate and Automatic Channel Pruning.

Efficient Network Compression Through Smooth-Lasso Constraint

Pruning with Compensation: Efficient Channel Pruning for Deep Convolutional Neural Networks

Automatic channel pruning via clustering and swarm intelligence optimization for CNN

CPRNC: Channels pruning via reverse neuron crowding for model compression

UPSCALE: Unconstrained Channel Pruning

Dynamical Channel Pruning by Conditional Accuracy Change for Deep Neural Networks

LAPP: Layer Adaptive Progressive Pruning for Compressing CNNs from Scratch

Weak sub-network pruning for strong and efficient neural networks

SIECP: Neural Network Channel Pruning based on Sequential Interval Estimation

Single-path Bit Sharing for Automatic Loss-aware Model Compression

Learning Low Resource Consumption CNN through Pruning and Quantization

A Channel Pruning Algorithm Based On Depth-Wise Separable Convolution Unit