Abstract:Deep convolutional neural networks (CNNs) have achieved impressive performance in many computer vision tasks. However, their large model sizes require heavy computational resources, making pruning redundant filters from existing pre-trained CNNs an essential task in developing efficient models for resource-constrained devices. Whole-network filter pruning algorithms prune varying fractions of filters from each layer, hence providing greater flexibility. Current whole-network pruning methods are either computationally expensive due to the need to calculate the loss for each pruned filter using a training dataset, or use various heuristic / learned criteria for determining the pruning fractions for each layer. This paper proposes a two-level hierarchical approach for whole-network filter pruning which is efficient and uses the classification loss as the final criterion. The lower-level algorithm (called filter-pruning) uses a sparse-approximation formulation based on linear approximation of filter weights. We explore two algorithms: orthogonal matching pursuit-based greedy selection and a greedy backward pruning approach. The backward pruning algorithm uses a novel closed-form error criterion for efficiently selecting the optimal filter at each stage, thus making the whole algorithm much faster. The higher-level algorithm (called layer-selection) greedily selects the best-pruned layer (pruning using the filter-selection algorithm) using a global pruning criterion. We propose algorithms for two different global-pruning criteria: (1) layer-wise relative error (HBGS), and (2) final classification error (HBGTS). Our suite of algorithms outperforms state-of-the-art pruning methods on ResNet18, ResNet32, ResNet56, VGG16, and ResNext101. Our method reduces the RAM requirement for ResNext101 from 7.6 GB to 1.5 GB and achieves a 94% reduction in FLOPS without losing accuracy on CIFAR-10.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How to efficiently perform all - network filter pruning in convolutional neural networks (CNNs) to reduce the computational resource requirements of the model while maintaining or improving the model performance. Specifically, the paper aims to develop a simple and efficient technique for all - network filter pruning, so that the pruned model can run on resource - constrained devices. ### Problem Background Deep convolutional neural networks (CNNs) perform excellently in many computer vision tasks, but their large model sizes require a large amount of computational resources. Therefore, when deploying these models on resource - constrained devices, pruning redundant filters becomes a key task in developing efficient models. The all - network filter pruning algorithm can flexibly prune different proportions of filters from each layer. However, the existing all - network pruning methods are either computationally expensive because they need to use the training data set to calculate the loss of each pruned filter, or rely on various heuristic/learning criteria to determine the pruning proportion of each layer. Therefore, a simple and efficient all - network pruning technique is required. ### Solutions Proposed in the Paper The paper proposes a two - layer hierarchical greedy method for all - network filter pruning, which is efficient and uses the classification loss as the final criterion. Specifically: 1. **Low - level Algorithm (Filter - Pruning)**: - Use a sparse approximation formula based on linearly approximating filter weights. - Explore two algorithms: greedy selection based on orthogonal matching pursuit and the greedy backward pruning method. - The backward pruning algorithm uses a novel closed - form error criterion, thus more quickly selecting the best filter at each stage. 2. **High - level Algorithm (Layer - Selection)**: - Greedily select the best pruning layer (prune using the filter selection algorithm), using a global pruning criterion. - Propose two different global pruning criteria: (1) layer - by - layer relative error (HBGS), (2) final classification error (HBGTS). ### Experimental Results The experimental results show that this method outperforms the existing state - of - the - art pruning methods on standard pre - trained CNN models such as ResNet18, ResNet32, ResNet56, VGG16 and ResNext101. In particular, when the number of parameters is reduced by more than 90%, this method improves the accuracy by about 5% compared to the existing methods. In addition, this method also significantly reduces the memory requirements and FLOPS. For example, it reduces the RAM requirement of ResNext101 from 7.6 GB to 1.5 GB and achieves a 94% reduction in FLOPS without losing accuracy. ### Summary The paper proposes a novel hierarchical greedy framework for non - uniformly pruning filters and proposes the FP - Backward scheme based on backward elimination, which uses a closed - form expression to optimize the approximation error. In addition, HBGTS is also proposed to directly optimize the classification error for layer selection. Through these improvements, the paper provides an efficient and accurate all - network pruning technique suitable for large - scale networks.

A Greedy Hierarchical Approach to Whole-Network Filter- Pruning in CNNs

Towards Efficient Filter Pruning Via Topology

Pruning by Training: A Novel Deep Neural Network Compression Framework for Image Processing.

Class-Aware Pruning for Efficient Neural Networks

Batch-Normalization-based Soft Filter Pruning for Deep Convolutional Neural Networks

Pruning Filters while Training for Efficiently Optimizing Deep Learning Networks

Filter Pruning Via Feature Map Clustering.

Pruning filters with L1-norm and standard deviation for CNN compression

A Pruning Method Based on the Dissimilarity of Angle among Channels and Filters

Learning to Prune Filters in Convolutional Neural Networks

Provable Filter Pruning for Efficient Neural Networks

Skeleton Neural Networks via Low-rank Guided Filter Pruning

Network Pruning Via Probing the Importance of Filters

Accelerating Convolutional Neural Networks By Group-Wise 2d-Filter Pruning

Efficient CNNs via Passive Filter Pruning

Auto-Balanced Filter Pruning for Efficient Convolutional Neural Networks

Complex hybrid weighted pruning method for accelerating convolutional neural networks

Filter Pruning for CNN with Enhanced Linear Representation Redundancy

A Graphical Approach for Filter Pruning by Exploring the Similarity Relation between Feature Maps

Pruning Networks with Cross-Layer Ranking K-Reciprocal Nearest Filters

Filter Pruning by Switching to Neighboring CNNs With Good Attributes