Deep Convolutional Neural Networks Structured Pruning via Gravity Regularization

Abdesselam Ferdi
2024-11-26
Abstract:Structured pruning is a widely employed strategy for accelerating deep convolutional neural networks (DCNNs). However, existing methods often necessitate modifications to the original architectures, involve complex implementations, and require lengthy fine-tuning stages. To address these challenges, we propose a novel physics-inspired approach that integrates the concept of gravity into the training stage of DCNNs. In this approach, the gravity is directly proportional to the product of the masses of the convolution filter and the attracting filter, and inversely proportional to the square of the distance between them. We applied this force to the convolution filters, either drawing filters closer to the attracting filter (experiencing weaker gravity) toward non-zero weights or pulling filters farther away (subject to stronger gravity) toward zero weights. As a result, filters experiencing stronger gravity have their weights reduced to zero, enabling their removal, while filters under weaker gravity retain significant weights and preserve important information. Our method simultaneously optimizes the filter weights and ranks their importance, eliminating the need for complex implementations or extensive fine-tuning. We validated the proposed approach on popular DCNN architectures using the CIFAR dataset, achieving competitive results compared to existing methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to effectively accelerate deep convolutional neural networks (DCNNs) through structured pruning without modifying the original model architecture and without the need for complex implementation or long - time fine - tuning, while maintaining the accuracy and performance of the model. Specifically, the author proposes a new method based on the concept of physical gravity, aiming to optimize the weight distribution of convolutional filters and rank the filters according to their importance. This method can reduce the complexity of the model, decrease the computational and memory requirements, so that the deep - learning model can run more efficiently on resource - constrained devices. ### Main problems 1. **Limitations of existing methods**: - Existing structured pruning methods usually require modification of the original architecture. - The implementation process is complex and requires a long - time fine - tuning phase. 2. **Objectives**: - Propose a new physics - inspired method, introducing the concept of gravity into the training stage of DCNNs. - Through the effect of gravity, redistribute the weights of the filters, so as to achieve effective pruning. - This method should be easy to implement, not require modification of the original architecture, and can be applied under different pruning ratios without the need for retraining. ### Solution overview The author proposes a structured pruning method based on gravitational regularization. In this method, the gravitational force is proportional to the mass of the convolutional filter and inversely proportional to the square of the distance between them. In this way, the gravitational force can pull the weights of the filters towards zero or non - zero values, thus achieving sparsity. Eventually, those filters that are subject to stronger gravitational forces will be removed, while those filters with larger weights that are affected by weaker gravitational forces will be retained. ### Mathematical expressions The gravitational formula is: \[ F = G \frac{m_1 m_n}{d^2} \] where: - \( F \) is the magnitude of the gravitational force, - \( G \) is the gravitational constant, - \( m_1 \) and \( m_n \) are the masses of the attracting filter and the convolutional filter respectively, - \( d \) is the distance between the two filters. In DCNNs, the mass \( m_n \) of the filter is defined as the L1 norm of its weights: \[ m_n=\|W_n, l\|_1 \] The distance \( d \) is defined as the absolute difference between the two filter indices: \[ d = |p_1 - p_n| \] In order to achieve the desired effect, the distance is re - defined as: \[ d=\frac{1}{|p_1 - p_n|} \] Finally, the gravitational regularization term \( F_{n,l} \) is expressed as: \[ F_{n,l}=G \frac{m_{1,l} m_{n,l}}{d_{n,l}^2} \] And this regularization term is added to the loss function: \[ \tilde{J}=J(w_{n,l}; X, y)+\alpha_g F_{n,l} \] where \( \alpha_g \) is the gravitational rate, which is used to adjust the relative contribution of the gravitational term with respect to the standard loss function. ### Conclusion The experimental results of this method on the CIFAR dataset show that it can effectively reduce the model parameters and the amount of computation without significantly reducing the accuracy. Compared with the existing pruning methods, this method has higher efficiency and flexibility, especially in high - demand application scenarios (such as autonomous driving and the medical field).