Abstract:Filter pruning of convolutional neural networks (CNNs) is a common technique to effectively reduce the memory footprint, the number of arithmetic operations, and, consequently, inference time. Recent pruning approaches also consider the targeted device (i.e., graphics processing units) for CNN deployment to reduce the actual inference time. However, simple metrics, such as the -norm, are used for deciding which filters to prune. In this work, we propose a hardware-aware technique to explore the vast multi-objective design space of possible filter pruning configurations. Our approach incorporates not only the targeted device but also techniques from explainable artificial intelligence for ranking and deciding which filters to prune. For each layer, the number of filters to be pruned is optimized with the objective of minimizing the inference time and the error rate of the CNN. Experimental results show that our approach can speed up inference time by 1.40× and 1.30× for VGG-16 on the CIFAR-10 dataset and ResNet-18 on the ILSVRC-2012 dataset, respectively, compared to the state-of-the-art ABCPruner.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How to effectively reduce the memory footprint, the number of arithmetic operations, and the inference time in convolutional neural networks (CNNs) through the Hardware - Aware Evolutionary Explainable Filter Pruning technology, while ensuring the accuracy of the model. Specifically, the paper mainly focuses on the following issues: 1. **Limitations of existing methods**: - Traditional filter pruning methods usually use simple metrics (such as the \(\ell_1\) norm) to decide which filters to prune. These methods ignore the specific hardware characteristics of the target device (such as GPU). - Existing methods are less efficient in exploring the large multi - objective design space and it is difficult to find the optimal solution. 2. **Introduction of hardware awareness and explainable AI**: - The paper proposes a method that combines hardware awareness and explainable artificial intelligence (XAI) to more accurately evaluate the importance of each filter and determine the pruning strategy. - Automatically determine the pruning step size for each layer through sensitivity analysis, avoiding the need for manual parameter setting. 3. **Optimization objectives**: - Instead of just minimizing the number of floating - point operations (FLOPs), directly minimize the inference time on the target device. - At the same time, minimize the error rate to ensure that the pruned CNN still maintains high accuracy. 4. **Experimental verification**: - Experiments were carried out on an Nvidia RTX 2080 Ti GPU to verify the effectiveness of the proposed method. The results show that, compared with existing advanced methods (such as ABCPruner), the new method can significantly improve the inference speed on models such as VGG - 16 and ResNet - 18 by 1.40 times and 1.30 times respectively. ### Summary of key contributions: - **Hardware - Aware Design Space Exploration (DSE)**: Systematically explore the filter pruning options of a given CNN, taking into account hardware characteristics. - **XAI - Based Filter Importance Ranking**: Introduce new XAI metrics to evaluate the importance of filters in each layer and determine the pruning strategy. - **Automatically Determine Pruning Step Size**: Allocate an appropriate pruning step size for each layer through sensitivity analysis without the need for user expertise. - **Multi - Objective Evolutionary Algorithm (MOEA)**: Combine XAI - based filter pruning technology to optimize the error rate and inference time and find non - dominated configurations. Through these improvements, the paper provides a more efficient and accurate CNN pruning method, which is especially suitable for embedded devices and resource - constrained environments.

Hardware-Aware Evolutionary Explainable Filter Pruning for Convolutional Neural Networks

Pruning by Training: A Novel Deep Neural Network Compression Framework for Image Processing.

Class-Aware Pruning for Efficient Neural Networks

Batch-Normalization-based Soft Filter Pruning for Deep Convolutional Neural Networks

Towards Efficient Filter Pruning Via Adaptive Automatic Structure Search

Structured Pruning for Efficient Convolutional Neural Networks Via Incremental Regularization

Convolutional neural network acceleration algorithm based on filters pruning

Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Where to Prune: Using LSTM to Guide Data-Dependent Soft Pruning

A Greedy Hierarchical Approach to Whole-Network Filter- Pruning in CNNs

Model pruning based on filter similarity for edge device deployment

Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks

Accelerating Convolutional Neural Networks By Group-Wise 2d-Filter Pruning

Network Pruning Using Adaptive Exemplar Filters

Provable Filter Pruning for Efficient Neural Networks

Auto-Balanced Filter Pruning for Efficient Convolutional Neural Networks

Cluster Pruning: An Efficient Filter Pruning Method for Edge AI Vision Applications

Neural Network Pruning by Cooperative Coevolution

Quantized Guided Pruning for Efficient Hardware Implementations of Convolutional Neural Networks

Pruning filters with L1-norm and standard deviation for CNN compression

Asymptotic Soft Filter Pruning for Deep Convolutional Neural Networks