Where to Prune: Using LSTM to Guide Data-Dependent Soft Pruning

Guiguang Ding,Shuo Zhang,Zizhou Jia,Jing Zhong,Jungong Han

DOI: https://doi.org/10.1109/tip.2020.3035028

IF: 10.6

2021-01-01

IEEE Transactions on Image Processing

Abstract:While convolutional neural network (CNN) has achieved overwhelming success in various vision tasks, its heavy computational cost and storage overhead limit the practical use on mobile or embedded devices. Recently, compressing CNN models has attracted considerable attention, where pruning CNN filters, also known as the channel pruning, has generated great research popularity due to its high compression rate. In this paper, a new channel pruning framework is proposed, which can significantly reduce the computational complexity while maintaining sufficient model accuracy. Unlike most existing approaches that seek to-be-pruned filters layer by layer, we argue that choosing appropriate layers for pruning is more crucial, which can result in more complexity reduction but less performance drop. To this end, we utilize a long short-term memory (LSTM) to learn the hierarchical characteristics of a network and generate a global network pruning scheme. On top of it, we propose a data-dependent soft pruning method, dubbed Squeeze-Excitation-Pruning (SEP), which does not physically prune any filters but selectively excludes some kernels involved in calculating forward and backward propagations depending on the pruning scheme. Compared with the hard pruning, our soft pruning can better retain the capacity and knowledge of the baseline model. Experimental results demonstrate that our approach still achieves comparable accuracy even when reducing 70.1% Floating-point operation per second (FLOPs) for VGG and 47.5% for Resnet-56.

computer science, artificial intelligence,engineering, electrical & electronic

What problem does this paper attempt to address?

The paper primarily aims to address the issues of high computational cost and large storage overhead when applying Convolutional Neural Networks (CNNs) on mobile or embedded devices. To enable CNNs to run efficiently on devices with limited computational resources, the authors propose a novel channel pruning framework designed to significantly reduce computational complexity while maintaining sufficient model accuracy. Specifically, unlike most existing methods that search for filters to prune layer by layer, this paper argues that selecting the appropriate pruning layers is more critical, as it can lead to greater complexity reduction with less performance degradation. To achieve this goal, the authors utilize Long Short-Term Memory networks (LSTM) to learn the hierarchical characteristics of the network and generate a global network pruning scheme. On this basis, a data-dependent soft pruning method called Squeeze-Excite-Prune (SEP) is proposed. This method does not physically remove any filters but selectively excludes certain convolutional kernels from participating in forward and backward propagation computations based on the pruning scheme. Experimental results show that even with a reduction of 70.1% in VGG and 47.5% in ResNet-56 floating-point operations (FLOPs), the method can still achieve comparable accuracy. Additionally, the paper discusses how to update the LSTM through reinforcement learning to optimize pruning decisions and proposes a dynamic and data-dependent soft pruning strategy to better retain the capacity and knowledge of the baseline model, thereby ensuring better performance.

Where to Prune: Using LSTM to Guide Data-Dependent Soft Pruning

Loss Constrains Added Squeeze and Excitation Blocks for Pruning Deep Neural Networks

Where to Prune: Using LSTM to Guide End-to-end Pruning

Structured Pruning for Efficient Convolutional Neural Networks Via Incremental Regularization

Pruning by Training: A Novel Deep Neural Network Compression Framework for Image Processing.

Class-Aware Pruning for Efficient Neural Networks

Batch-Normalization-based Soft Filter Pruning for Deep Convolutional Neural Networks

Structured Deep Neural Network Pruning by Varying Regularization Parameters.

A Pruning Method Based on the Dissimilarity of Angle among Channels and Filters

Pruning filters with L1-norm and standard deviation for CNN compression

LAPP: Layer Adaptive Progressive Pruning for Compressing CNNs from Scratch

Efficient Network Compression Through Smooth-Lasso Constraint

Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Iterative clustering pruning for convolutional neural networks

Auto-Balanced Filter Pruning for Efficient Convolutional Neural Networks

Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks

Cross-layer importance evaluation for neural network pruning

Layer Pruning via Fusible Residual Convolutional Block for Deep Neural Networks

Pruning at a Glance: Global Neural Pruning for Model Compression

SNPF: Sensitiveness Based Network Pruning Framework for Efficient Edge Computing

Compressing CNNs Using Multilevel Filter Pruning for the Edge Nodes of Multimedia Internet of Things