Abstract:Deep learning models have evolved into powerful tools that can be used for many artificial intelligence tasks. However, deploying deep neural networks into real-world applications is still challenging due to their high computational complexity and storage overhead. Fortunately, a densely connected neural network can be converted into a sparsely connected network with low resource demand by the neural network compression. Since deep neural networks are complicated, compression mechanism should find a tradeoff between compression ratio and model accuracy. In this article, by analyzing the statistics of channel connection, we propose an interactive neural network compression mechanism including out-in-channel pruning and neural network quantization. Many channel pruning works apply structured sparsity regularization on each layer separately. We consider correlations between successive layers to retain predictive power of the compact network. A global greedy pruning algorithm is designed to remove redundant out-in-channels in an iterative way. Moreover, in order to solve the shortcomings of the one-shot quantization, we propose the incremental quantization algorithm in the dimension of the output channel, which can smooth network fluctuations and recover accuracy better during retraining. Our mechanism is comprehensively evaluated with various Convolutional Neural Networks (CNN) architectures on popular datasets. Notably, on ImageNet-1K, the out-in-channel pruning reduce 54.0 percent FLOPS on AlexNet and 50.0 percent FLOPs on ResNet-50 with only 0.15 and 0.37 percent top-1 accuracy drop respectively. On classification and style transfer tasks, the superiority of incremental quantization increases with the decrease of the number of quantization bits.

Loss Constrains Added Squeeze and Excitation Blocks for Pruning Deep Neural Networks

Pruning by Training: A Novel Deep Neural Network Compression Framework for Image Processing.

Class-Aware Pruning for Efficient Neural Networks

Structured Pruning for Efficient Convolutional Neural Networks Via Incremental Regularization

Efficient Network Compression Through Smooth-Lasso Constraint

A Pruning Method Based on the Dissimilarity of Angle among Channels and Filters

Where to Prune: Using LSTM to Guide Data-Dependent Soft Pruning

Adversarial Structured Neural Network Pruning

AdaPruner: Adaptive Channel Pruning and Effective Weights Inheritance

An efficient pruning and fine-tuning method for deep spiking neural network

Pruning with Compensation: Efficient Channel Pruning for Deep Convolutional Neural Networks

Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures

Students and teachers learning together: a robust training strategy for neural network pruning

Dynamical Channel Pruning by Conditional Accuracy Change for Deep Neural Networks

UPDP: A Unified Progressive Depth Pruner for CNN and Vision Transformer

Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon

Three-Stage Global Channel Pruning for Resources-Limited Platform

Learning Low Resource Consumption CNN through Pruning and Quantization

Separate, Dynamic and Differentiable (SMART) Pruner for Block/Output Channel Pruning on Computer Vision Tasks

Pruning at a Glance: Global Neural Pruning for Model Compression

Channel Pruning Method Based on Decoupling Feature Scale Distribution in Batch Normalization Layers