Hardware-Aware Evolutionary Explainable Filter Pruning for Convolutional Neural Networks

Christian Heidorn,Muhammad Sabih,Nicolai Meyerhöfer,Christian Schinabeck,Jürgen Teich,Frank Hannig
DOI: https://doi.org/10.1007/s10766-024-00760-5
2024-02-23
International Journal of Parallel Programming
Abstract:Filter pruning of convolutional neural networks (CNNs) is a common technique to effectively reduce the memory footprint, the number of arithmetic operations, and, consequently, inference time. Recent pruning approaches also consider the targeted device (i.e., graphics processing units) for CNN deployment to reduce the actual inference time. However, simple metrics, such as the -norm, are used for deciding which filters to prune. In this work, we propose a hardware-aware technique to explore the vast multi-objective design space of possible filter pruning configurations. Our approach incorporates not only the targeted device but also techniques from explainable artificial intelligence for ranking and deciding which filters to prune. For each layer, the number of filters to be pruned is optimized with the objective of minimizing the inference time and the error rate of the CNN. Experimental results show that our approach can speed up inference time by 1.40× and 1.30× for VGG-16 on the CIFAR-10 dataset and ResNet-18 on the ILSVRC-2012 dataset, respectively, compared to the state-of-the-art ABCPruner.
computer science, theory & methods
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How to effectively reduce the memory footprint, the number of arithmetic operations, and the inference time in convolutional neural networks (CNNs) through the Hardware - Aware Evolutionary Explainable Filter Pruning technology, while ensuring the accuracy of the model. Specifically, the paper mainly focuses on the following issues: 1. **Limitations of existing methods**: - Traditional filter pruning methods usually use simple metrics (such as the \(\ell_1\) norm) to decide which filters to prune. These methods ignore the specific hardware characteristics of the target device (such as GPU). - Existing methods are less efficient in exploring the large multi - objective design space and it is difficult to find the optimal solution. 2. **Introduction of hardware awareness and explainable AI**: - The paper proposes a method that combines hardware awareness and explainable artificial intelligence (XAI) to more accurately evaluate the importance of each filter and determine the pruning strategy. - Automatically determine the pruning step size for each layer through sensitivity analysis, avoiding the need for manual parameter setting. 3. **Optimization objectives**: - Instead of just minimizing the number of floating - point operations (FLOPs), directly minimize the inference time on the target device. - At the same time, minimize the error rate to ensure that the pruned CNN still maintains high accuracy. 4. **Experimental verification**: - Experiments were carried out on an Nvidia RTX 2080 Ti GPU to verify the effectiveness of the proposed method. The results show that, compared with existing advanced methods (such as ABCPruner), the new method can significantly improve the inference speed on models such as VGG - 16 and ResNet - 18 by 1.40 times and 1.30 times respectively. ### Summary of key contributions: - **Hardware - Aware Design Space Exploration (DSE)**: Systematically explore the filter pruning options of a given CNN, taking into account hardware characteristics. - **XAI - Based Filter Importance Ranking**: Introduce new XAI metrics to evaluate the importance of filters in each layer and determine the pruning strategy. - **Automatically Determine Pruning Step Size**: Allocate an appropriate pruning step size for each layer through sensitivity analysis without the need for user expertise. - **Multi - Objective Evolutionary Algorithm (MOEA)**: Combine XAI - based filter pruning technology to optimize the error rate and inference time and find non - dominated configurations. Through these improvements, the paper provides a more efficient and accurate CNN pruning method, which is especially suitable for embedded devices and resource - constrained environments.