Abstract:We study network pruning which aims to remove redundant channels/kernels and hence speed up the inference of deep networks. Existing pruning methods either train from scratch with sparsity constraints or minimize the reconstruction error between the feature maps of the pre-trained models and the compressed ones. Both strategies suffer from some limitations: the former kind is computationally expensive and difficult to converge, while the latter kind optimizes the reconstruction error but ignores the discriminative power of channels. In this paper, we propose a simple-yet-effective method called discrimination-aware channel pruning (DCP) to choose the channels that actually contribute to the discriminative power. To this end, we first introduce additional discrimination-aware losses into the network to increase the discriminative power of the intermediate layers. Next, we select the most discriminative channels for each layer by considering the discrimination-aware loss and the reconstruction error, simultaneously. We then formulate channel pruning as a sparsity-inducing optimization problem with a convex objective and propose a greedy algorithm to solve the resultant problem. Note that a channel (3D tensor) often consists of a set of kernels (each with a 2D matrix). Besides the redundancy in channels, some kernels in a channel may also be redundant and fail to contribute to the discriminative power of the network, resulting in kernel level redundancy. To solve this issue, we propose a discrimination-aware kernel pruning (DKP) method to further compress deep networks by removing redundant kernels. To avoid manually determining the pruning rate for each layer, we propose two adaptive stopping conditions to automatically determine the number of selected channels/kernels. The proposed adaptive stopping conditions tend to yield more efficient models with better performance in practice. Extensive experiments on both image classification and face recognition demonstrate t-e effectiveness of our methods. For example, on ILSVRC-12, the resultant ResNet-50 model with 30 percent reduction of channels even outperforms the baseline model by 0.36 percent in terms of Top-1 accuracy. We also deploy the pruned models on a smartphone (equipped with a Qualcomm Snapdragon 845 processor). The pruned MobileNetV1 and MobileNetV2 achieve 1.93× and 1.42× inference acceleration on the mobile device, respectively, with negligible performance degradation. The source code and the pre-trained models are available at https://github.com/SCUT-AILab/DCP.

Efficient Inference for Pruned CNN Models on Mobile Devices With Holistic Sparsity Alignment

All-in-One: A Highly Representative DNN Pruning Framework for Edge Devices with Dynamic Power Management

Single-shot Pruning and Quantization for Hardware-Friendly Neural Network Acceleration

Structured Pruning for Efficient Convolutional Neural Networks Via Incremental Regularization

MCMC: Multi-Constrained Model Compression Via One-Stage Envelope Reinforcement Learning.

Class-Aware Pruning for Efficient Neural Networks

An Image Enhancing Pattern-Based Sparsity for Real-Time Inference on Mobile Devices

PCONV: the Missing but Desirable Sparsity in DNN Weight Pruning for Real-Time Execution on Mobile Devices.

Cloud–Edge Collaborative Inference with Network Pruning

Accelerating Sparse CNN Inference on GPUs with Performance-Aware Weight Pruning

Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Exploiting Sparsity to Accelerate Fully Connected Layers of CNN-Based Applications on Mobile SoCs

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

A Pruning Method Based on the Dissimilarity of Angle among Channels and Filters

SparseByteNN: A Novel Mobile Inference Acceleration Framework Based on Fine-Grained Group Sparsity

Efficient Network Compression Through Smooth-Lasso Constraint

Discrimination-aware Network Pruning for Deep Model Compression

Crane: Mitigating Accelerator Under-utilization Caused by Sparsity Irregularities in CNNs

Frequency-Domain Dynamic Pruning for Convolutional Neural Networks

MobilePrune: Neural Network Compression via l(0) Sparse Group Lasso on the Mobile System

Accelerator-Aware Pruning for Convolutional Neural Networks