Abstract:While convolutional neural network (CNN) has achieved overwhelming success in various vision tasks, its heavy computational cost and storage overhead limit the practical use on mobile or embedded devices. Recently, compressing CNN models has attracted considerable attention, where pruning CNN filters, also known as the channel pruning, has generated great research popularity due to its high compression rate. In this paper, a new channel pruning framework is proposed, which can significantly reduce the computational complexity while maintaining sufficient model accuracy. Unlike most existing approaches that seek to-be-pruned filters layer by layer, we argue that choosing appropriate layers for pruning is more crucial, which can result in more complexity reduction but less performance drop. To this end, we utilize a long short-term memory (LSTM) to learn the hierarchical characteristics of a network and generate a global network pruning scheme. On top of it, we propose a data-dependent soft pruning method, dubbed Squeeze-Excitation-Pruning (SEP), which does not physically prune any filters but selectively excludes some kernels involved in calculating forward and backward propagations depending on the pruning scheme. Compared with the hard pruning, our soft pruning can better retain the capacity and knowledge of the baseline model. Experimental results demonstrate that our approach still achieves comparable accuracy even when reducing 70.1% Floating-point operation per second (FLOPs) for VGG and 47.5% for Resnet-56.

PipePrune: Pipeline Parallel Based on Convolutional Layer Pruning for Distributed Deep Learning.

Pruning by Training: A Novel Deep Neural Network Compression Framework for Image Processing.

Structured Probabilistic Pruning for Convolutional Neural Network Acceleration.

Loss Constrains Added Squeeze and Excitation Blocks for Pruning Deep Neural Networks

vPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training

BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training

PipeCompress: Accelerating Pipelined Communication for Distributed Deep Learning

PipeMare: Asynchronous Pipeline Parallel DNN Training

PipeDream: Fast and Efficient Pipeline Parallel DNN Training

GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

A Dynamic Pruning Method on Multiple Sparse Structures in Deep Neural Networks

A Pruning Method Based on the Dissimilarity of Angle among Channels and Filters

ElasticPipe

A Compact Parallel Pruning Scheme for Deep Learning Model and Its Mobile Instrument Deployment

DBP: Discrimination Based Block-Level Pruning for Deep Model Acceleration.

Comb, Prune, Distill: Towards Unified Pruning for Vision Model Compression

Analyzing the Performance of Graph Neural Networks with Pipe Parallelism

UPDP: A Unified Progressive Depth Pruner for CNN and Vision Transformer

Where to Prune: Using LSTM to Guide Data-Dependent Soft Pruning

GNNPipe: Scaling Deep GNN Training with Pipelined Model Parallelism