Improving Network Slimming with Nonconvex Regularization

Kevin Bui,Fredrick Park,Shuai Zhang,Yingyong Qi,Jack Xin
DOI: https://doi.org/10.48550/arXiv.2010.01242
2021-08-19
Abstract:Convolutional neural networks (CNNs) have developed to become powerful models for various computer vision tasks ranging from object detection to semantic segmentation. However, most of the state-of-the-art CNNs cannot be deployed directly on edge devices such as smartphones and drones, which need low latency under limited power and memory bandwidth. One popular, straightforward approach to compressing CNNs is network slimming, which imposes $\ell_1$ regularization on the channel-associated scaling factors via the batch normalization layers during training. Network slimming thereby identifies insignificant channels that can be pruned for inference. In this paper, we propose replacing the $\ell_1$ penalty with an alternative nonconvex, sparsity-inducing penalty in order to yield a more compressed and/or accurate CNN architecture. We investigate $\ell_p (0 < p < 1)$, transformed $\ell_1$ (T$\ell_1$), minimax concave penalty (MCP), and smoothly clipped absolute deviation (SCAD) due to their recent successes and popularity in solving sparse optimization problems, such as compressed sensing and variable selection. We demonstrate the effectiveness of network slimming with nonconvex penalties on three neural network architectures -- VGG-19, DenseNet-40, and ResNet-164 -- on standard image classification datasets. Based on the numerical experiments, T$\ell_1$ preserves model accuracy against channel pruning, $\ell_{1/2, 3/4}$ yield better compressed models with similar accuracies after retraining as $\ell_1$, and MCP and SCAD provide more accurate models after retraining with similar compression as $\ell_1$. Network slimming with T$\ell_1$ regularization also outperforms the latest Bayesian modification of network slimming in compressing a CNN architecture in terms of memory storage while preserving its model accuracy after channel pruning.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the effect of the Network Slimming method in convolutional neural network (CNNs) compression techniques. Specifically, the author proposes to use non - convex sparse regularization to replace the traditional ℓ1 regularization, hoping to achieve more efficient model compression without sacrificing model accuracy. By introducing non - convex regularization terms, such as ℓp (0 < p < 1), transformed ℓ1 (Tℓ1), minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD), researchers hope to find a better way to identify and remove unimportant channels in the network, thereby reducing the number of model parameters and computational complexity while maintaining or improving the performance of the model. The main contributions of the paper include: 1. **Proposing the use of non - convex regularization**: By using non - convex regularization terms (such as ℓp, Tℓ1, MCP and SCAD), researchers attempt to improve the existing network slimming methods to achieve more efficient model compression. 2. **Experimental verification**: The author has carried out experiments on multiple neural network architectures such as VGG - 19, DenseNet - 40 and ResNet - 164 on multiple standard datasets (such as CIFAR - 10, CIFAR - 100 and SVHN), verifying the effectiveness and advantages of non - convex regularization in network slimming. 3. **Performance comparison**: The experimental results show that the network slimming method using Tℓ1 regularization can maintain the accuracy of the model after channel pruning, while ℓ1/2 and ℓ3/4 can obtain more compressed models after retraining and the accuracy is comparable to that of ℓ1. In addition, MCP and SCAD provide higher accuracy after retraining, and the compression effect is similar to that of ℓ1. In conclusion, this paper aims to improve the network slimming method by introducing non - convex regularization terms, so as to achieve more efficient model compression without losing model accuracy. This provides a new solution for deploying complex CNN models on resource - constrained devices.