Abstract:Deep convolutional neural networks (CNNs) have achieved tremendous successes but tend to suffer from high computation costs mainly due to heavy over-parameterization, resulting in the difficulty of directly applying them to the ever-growing application demands based on low-end edge devices with strong power restriction and real-time inference requirement. Recently, there has much research attention devoted to compressing the network via pruning to address this issue. Most of the existing methods rely on some hand-designed pruning rules, which suffer from several limitations. Firstly, manually designed rules are only applicable to limited application scenarios, which can hardly generalize well in a broader scope. And these rules are typically designed based on human experience and via trial and error, and thus highly subjective. Then, channels of different layers in a network may have diverse distributions, which means the same pruning rule is not appropriate for each layer. To address these limitations, we propose a novel channel pruning scheme, in which the task-irrelevant channels are removed in a task-driven manner. Specifically, an adaptively differentiable search module is proposed to find the best pruning rule automatically for different layers in CNNs under sparsity constraints. Besides, we employed knowledge distillation to alleviate the excessive performance loss. Once the training process is finished, a compact network will be obtained by removing channels based on layer-wise pruning rules. We have evaluated the proposed method on some well-known benchmark datasets including CIFAR, MNIST, and ImageNet in comparison to several state-of-the-art pruning methods. Experimental results demonstrate the superiority of our method over the compared ones in terms of both parameters and FLOPs reduction.

Fast Cnn Pruning Via Redundancy-Aware Training

Deep Neural Network Acceleration with Sparse Prediction Layers

Structured Deep Neural Network Pruning by Varying Regularization Parameters.

SUBP: Soft Uniform Block Pruning for 1 X N Sparse CNNs Multithreading Acceleration

Structured Pruning for Efficient Convolutional Neural Networks Via Incremental Regularization

A Pruning Method Based on the Dissimilarity of Angle among Channels and Filters

Filter Pruning for CNN with Enhanced Linear Representation Redundancy

Accelerating Convolutional Neural Networks by Removing Interspatial and Interkernel Redundancies.

Identifying and Pruning Redundant Structures for Deep Neural Networks

Prune the Convolutional Neural Networks with Sparse Shrink

Manipulating Identical Filter Redundancy for Efficient Pruning on Deep and Complicated CNN.

Learning Low Resource Consumption CNN through Pruning and Quantization

Adversarial Structured Neural Network Pruning

Filter Pruning with a Feature Map Entropy Importance Criterion for Convolution Neural Networks Compressing

Auto-Balanced Filter Pruning for Efficient Convolutional Neural Networks

Pruning filters with L1-norm and standard deviation for CNN compression

An Efficient Channel-level Pruning for CNNs without Fine-tuning

Structured Network Pruning by Measuring Filter-wise Interactions

Accelerate CNN Via Recursive Bayesian Pruning.

Exploiting Channel Similarity for Network Pruning.

An Automatically Layer-wise Searching Strategy for Channel Pruning Based on Task-driven Sparsity Optimization