Abstract:Convolutional neural networks (CNNs) are reported to be overparametrized. The search for optimal (minimal) and sufficient architecture is an NP-hard problem as the hyperparameter space for possible network configurations is vast. Here, we introduce a layer-by-layer data-driven pruning method based on the mathematical idea aiming at a computationally-scalable entropic relaxation of the pruning problem. The sparse subnetwork is found from the pre-trained (full) CNN using the network entropy minimization as a sparsity constraint. This allows deploying a numerically scalable algorithm with a sublinear scaling cost. The method is validated on several benchmarks (architectures): (i) MNIST (LeNet) with sparsity 55%-84% and loss in accuracy 0.1%-0.5%, and (ii) CIFAR-10 (VGG-16, ResNet18) with sparsity 73-89% and loss in accuracy 0.1%-0.5%.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper attempts to solve the over - parameterization problem in convolutional neural networks (CNNs). Specifically, the author proposes a data - driven layer - by - layer sparsification method, optimizing the structure of CNNs by introducing entropy relaxation in mathematics. This method aims to find a sparse sub - network of a pre - trained complete CNN while maintaining high performance. The main contributions include: 1. **Algorithm adaptation**: Adapt the SPARTAn algorithm to the sparsification of convolutional layers, demonstrating that the algorithm also has sub - linear cost expansion and the ability to handle small data in convolutional layers with arbitrary support. 2. **Verification of effectiveness**: Verify the effectiveness of this method on the MNIST and CIFAR - 10 datasets, using multiple convolutional network architectures (such as LeNet, VGG - 16, ResNet18). For example, on CIFAR - 10, 89% of the weights can be removed from VGG - 16 with a performance loss of less than 0.1%. 3. **Redundancy analysis**: Determine which layers have the most redundancy and can be pruned more. 4. **Weight importance**: Explore that the value of network pruning lies in finding the optimal network architecture rather than specific weight values. By training randomly initialized weights from scratch, the usefulness of retaining the weights of the pre - trained sparse model is verified. ### Main methods 1. **Entropy regularization**: Achieve network sparsification by minimizing entropy as a sparse constraint in the regression problem. 2. **Layer - by - layer sparsification**: Interpret the convolutional layer as a fully - connected layer and achieve sparsification by solving a linear regression task with entropy regularization. 3. **Experimental verification**: Conduct experiments on multiple benchmark datasets and network architectures to verify the effectiveness and practicality of the method. ### Formula representation - **Discrete Shannon entropy**: \[ H(w)=-\sum_{i = 1}^{d}w_{i}\log w_{i} \] where \(w=(w_{1},\ldots,w_{d})\in\mathbb{R}_{\geq0}^{d}\) and \(\sum_{i = 1}^{d}w_{i}=1\) - **Sparse entropy regression loss function**: \[ L_{\text{sparsify}}(w,\Lambda)=\epsilon_{w}\sum_{d = 1}^{D}w_{d}\log w_{d}+\epsilon_{l2}\sum_{m = 1}^{M}\sum_{d = 1}^{k^{2}D}\Lambda_{m,d}^{2}+\frac{1}{T}\sum_{t = 1}^{T}\sum_{m = 1}^{M}\left(Y_{m,t}-\Lambda_{m,0}-\sum_{d = 1}^{D}w_{d}\sum_{l = 1}^{k^{2}}\Lambda_{m,(d - 1)k^{2}+l}X_{(d - 1)k^{2}+l,t}\right)^{2} \] ### Experimental results - **Performance of LeNet on MNIST**: - Sparsifying only the convolutional layers, reducing the number of parameters by 40% and 60%, the performance drops by 0.3% and 1.69% respectively. - Sparsifying the entire network, reducing the number of parameters from 55% to 84%, the performance drop is between 0.1% and 0.5%. - **Performance of VGG - 16 on CIFAR - 10**: - By selectively sparsifying different groups of convolutional layers, reducing the number of parameters from 54.59% to 99.95%, the performance drop is between 0.1% and 3.21%. ### Conclusion The method proposed in this paper can effectively reduce the number of parameters in convolutional neural networks while maintaining high performance, providing an optimization scheme for large - scale applications.

Towards Generalized Entropic Sparsification for Convolutional Neural Networks

Structured Pruning for Efficient Convolutional Neural Networks Via Incremental Regularization

SparseConnect: Regularising CNNs on Fully Connected Layers

Structured Deep Neural Network Pruning by Varying Regularization Parameters.

Overfitting Remedy by Sparsifying Regularization on Fully-Connected Layers of CNNs.

A Pruning Method Based on the Dissimilarity of Angle among Channels and Filters

Efficient Network Compression Through Smooth-Lasso Constraint

Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression

Prune the Convolutional Neural Networks with Sparse Shrink

Low-Cost Parameterizations of Deep Convolutional Neural Networks

A Generalization of Continuous Relaxation in Structured Pruning

Pre-defined Sparsity for Low-Complexity Convolutional Neural Networks

A Pre-defined Sparse Kernel Based Convolution for Deep CNNs

Optimizing Convolutional Neural Network Architecture

SparseTrain: Exploiting Dataflow Sparsity for Efficient Convolutional Neural Networks Training

Convolutional Neural Network Compression via Dynamic Parameter Rank Pruning

Refining Architectures of Deep Convolutional Neural Networks

Adaptive Structured Sparse Network for Efficient CNNs with Feature Regularization.

Improving Network Slimming with Nonconvex Regularization

Adaptive Structured Sparse Network for Efficient CNNs with Feature Regularization

Learning Efficient Convolutional Networks Through Network Slimming.