Abstract:Unstructured deep neural network (DNN) pruning have been widely studied. However, previous schemes only focused upon compressing the model's memory footprint, which had led to relatively low reduction ratio in computational workload. This study demonstrates that the main reason behind is the inconsistent distribution of memory footprint and workload of the DNN model among different layers. Based on this observation, we propose to map the network pruning flow as a multi-objective optimization problem and design an improved genetic algorithm, which can efficiently explore the whole pruning structure space with both pruning goals equally constrained, to find the suitable solution that strikes a judicious balance between the DNN's model size and workload. Experiments show that the proposed scheme can achieve up to <math>34%</math> further reduction on the model's computational workload compared to the state-of-the-art pruning scheme [11, 33] for ResNet50 on the ILSVRC-2012 dataset. We have also deployed the pruned ResNet50 models on a dedicated DNN accelerator, and the measured data have shown a considerable <math>6×</math> reduction in inference time compared to FPGA accelerator implementing dense CNN model quantized in INT8 format, and a <math>2.27×</math> improvement in power efficiency over 2080Ti GPU-based implementations, respectively

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: current neural network pruning methods fail to effectively reduce the computational workload while compressing the model's memory footprint. Specifically, the paper points out that existing methods mainly focus on reducing the model's memory footprint (i.e., the number of parameters), ignoring the inconsistency in the distribution of memory footprint and computational workload between different layers, resulting in poor reduction of computational workload in practical applications. To more effectively balance the model size and computational workload, the authors propose a multi - objective optimization problem and design an improved method based on the genetic algorithm (GenExp) to explore the entire pruning structure space and find solutions that achieve a reasonable balance between model size and computational workload. Experimental results show that this method can significantly reduce the computational workload without affecting the model's accuracy, thereby improving the inference performance on dedicated hardware accelerators. ### Main problem summary: 1. **Inconsistency between memory footprint and computational workload**: Existing pruning methods only focus on compressing the memory footprint and ignore the reduction of computational workload. 2. **Limitations of manually setting the pruning rate**: Manually setting the pruning rate for each layer is time - consuming and difficult to reach the optimal solution. 3. **Lack of exploration of sparse structures**: Existing unstructured pruning methods lack effective exploration of sparse structures, resulting in limited performance improvement. ### Solutions: - **Multi - objective optimization framework**: Model the pruning process as a multi - objective optimization problem, considering the reduction of both memory footprint and computational workload simultaneously. - **Pruning process based on the genetic algorithm**: Use the genetic algorithm to automatically explore the pruning rate of each layer and find the best sparse structure that satisfies the memory and computational workload constraints. - **Improved genetic algorithm**: Introduce improvement measures such as Gaussian initialization, progressive shrinkage mutation, and fine - grained crossover to improve the search efficiency and the quality of solutions. Through these methods, the new pruning scheme proposed in the paper can significantly reduce the computational workload while compressing the model size, thereby improving the running performance of the model in actual hardware deployment.

GenExp: Multi-objective pruning for deep neural network based on genetic algorithm

Class-Aware Pruning for Efficient Neural Networks

Loss Constrains Added Squeeze and Excitation Blocks for Pruning Deep Neural Networks

Structured Deep Neural Network Pruning by Varying Regularization Parameters.

A Dynamic Pruning Method on Multiple Sparse Structures in Deep Neural Networks

Sparse optimization guided pruning for neural networks

Pruning the Deep Neural Network by Similar Function

Sparse Training via Boosting Pruning Plasticity with Neuroregeneration

Efficient Joint Optimization of Layer-Adaptive Weight Pruning in Deep Neural Networks

Differential Evolution Based Layer-Wise Weight Pruning for Compressing Deep Neural Networks

Optimization based Layer-wise Magnitude-based Pruning for DNN Compression

ResNet Can Be Pruned 60x: Introducing Network Purification and Unused Path Removal (P-RM) after Weight Pruning

Pruning Deep Convolutional Neural Networks Architectures with Evolution Strategy

Neural network relief: a pruning algorithm based on neural activity

Multi-task Pruning via Filter Index Sharing: A Many-Objective Optimization Approach

Non-Parametric Adaptive Network Pruning

Multi-domain clustering pruning: Exploring space and frequency similarity based on GAN

SS-Auto: A Single-Shot, Automatic Structured Weight Pruning Framework of DNNs with Ultra-High Efficiency

Accelerator-Aware Pruning for Convolutional Neural Networks

HESSO: Towards Automatic Efficient and User Friendly Any Neural Network Training and Pruning

Adaptive Search-and-Training for Robust and Efficient Network Pruning