GenExp: Multi-objective pruning for deep neural network based on genetic algorithm

Ke Xu,Dezheng Zhang,Jianjing An,Li Liu,Lingzhi Liu,Dong Wang
DOI: https://doi.org/10.1016/j.neucom.2021.04.022
IF: 6
2021-09-01
Neurocomputing
Abstract:<p>Unstructured deep neural network (DNN) pruning have been widely studied. However, previous schemes only focused upon compressing the model's memory footprint, which had led to relatively low reduction ratio in computational workload. This study demonstrates that the main reason behind is the inconsistent distribution of memory footprint and workload of the DNN model among different layers. Based on this observation, we propose to map the network pruning flow as a multi-objective optimization problem and design an improved genetic algorithm, which can efficiently explore the whole pruning structure space with both pruning goals equally constrained, to find the suitable solution that strikes a judicious balance between the DNN's model size and workload. Experiments show that the proposed scheme can achieve up to <span class="math"><math>34%</math></span> further reduction on the model's computational workload compared to the state-of-the-art pruning scheme [11, 33] for ResNet50 on the ILSVRC-2012 dataset. We have also deployed the pruned ResNet50 models on a dedicated DNN accelerator, and the measured data have shown a considerable <span class="math"><math>6×</math></span> reduction in inference time compared to FPGA accelerator implementing dense CNN model quantized in INT8 format, and a <span class="math"><math>2.27×</math></span> improvement in power efficiency over 2080Ti GPU-based implementations, respectively</p>
computer science, artificial intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: current neural network pruning methods fail to effectively reduce the computational workload while compressing the model's memory footprint. Specifically, the paper points out that existing methods mainly focus on reducing the model's memory footprint (i.e., the number of parameters), ignoring the inconsistency in the distribution of memory footprint and computational workload between different layers, resulting in poor reduction of computational workload in practical applications. To more effectively balance the model size and computational workload, the authors propose a multi - objective optimization problem and design an improved method based on the genetic algorithm (GenExp) to explore the entire pruning structure space and find solutions that achieve a reasonable balance between model size and computational workload. Experimental results show that this method can significantly reduce the computational workload without affecting the model's accuracy, thereby improving the inference performance on dedicated hardware accelerators. ### Main problem summary: 1. **Inconsistency between memory footprint and computational workload**: Existing pruning methods only focus on compressing the memory footprint and ignore the reduction of computational workload. 2. **Limitations of manually setting the pruning rate**: Manually setting the pruning rate for each layer is time - consuming and difficult to reach the optimal solution. 3. **Lack of exploration of sparse structures**: Existing unstructured pruning methods lack effective exploration of sparse structures, resulting in limited performance improvement. ### Solutions: - **Multi - objective optimization framework**: Model the pruning process as a multi - objective optimization problem, considering the reduction of both memory footprint and computational workload simultaneously. - **Pruning process based on the genetic algorithm**: Use the genetic algorithm to automatically explore the pruning rate of each layer and find the best sparse structure that satisfies the memory and computational workload constraints. - **Improved genetic algorithm**: Introduce improvement measures such as Gaussian initialization, progressive shrinkage mutation, and fine - grained crossover to improve the search efficiency and the quality of solutions. Through these methods, the new pruning scheme proposed in the paper can significantly reduce the computational workload while compressing the model size, thereby improving the running performance of the model in actual hardware deployment.