Abstract:While deep neural networks are highly effective at solving complex tasks, their computational demands can hinder their usefulness in real-time applications and with limited-resources systems. Besides, for many tasks it is known that these models are over-parametrized: neoteric works have broadly focused on reducing the width of these networks, rather than their depth. In this paper, we aim to reduce the depth of over-parametrized deep neural networks: we propose an eNtropy-basEd Pruning as a nEural Network depTH's rEducer (NEPENTHE) to alleviate deep neural networks' computational burden. Based on our theoretical finding, NEPENTHE focuses on un-structurally pruning connections in layers with low entropy to remove them entirely. We validate our approach on popular architectures such as MobileNet and Swin-T, showing that when encountering an over-parametrization regime, it can effectively linearize some layers (hence reducing the model's depth) with little to no performance loss. The code will be publicly available upon acceptance of the article.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the excessive computational resource requirements of deep neural networks (DNNs) in practical applications. Specifically: 1. **Computational resource requirements**: Although DNNs perform excellently in solving complex tasks, their huge computational requirements limit their use in real - time applications and resource - limited systems. 2. **Model over - parameterization**: Many DNN models are designed to be too large and have the problem of over - parameterization, that is, the models contain far more parameters than actually needed. This not only increases the computational burden but may also lead to hardware and energy consumption challenges during model training and deployment. To address these problems, the paper proposes an entropy - based pruning method - NEPENTHE (ENTROPY - BASED PRUNING AS A NEURAL NETWORK DEPTH’s REDUCER), aiming to reduce the number of layers of over - parameterized deep neural networks, thereby reducing the computational burden. ### Main contributions of NEPENTHE 1. **Entropy - based single - neuron measurement**: Define the entropy of a neuron to measure the usage of its linear part, and linearize the entire layer by minimizing the average entropy (formulas (5) and (6)). 2. **Theoretical and empirical verification**: Prove that unstructured pruning naturally reduces the entropy of a layer in which the activation function is a rectifier, and verify this experimentally (formulas (7) - (12)). 3. **Entropy - guided pruning strategy**: Prove a new method, NEPENTHE, which gradually reduces the depth of the model by re - weighting the pruning budget at the layer level (formulas (13) - (15)). 4. **Experimental verification**: Verify the effectiveness of this method on multiple popular architectures (such as MobileNet and Swin - T), showing that some layers can be effectively linearized in the case of over - parameterization with almost no performance loss. ### Formula summary - **Neuron output**: \[ y^x_{l,i}=\psi(z^x_{l,i}) \] where \(z^x_{l,i}\) is the output of the \(i\) - th neuron in the \(l\) - th layer. - **Neuron state**: \[ s^x_{l,i}=\begin{cases}+1&\text{if }y^x_{l,i}>0\\-1&\text{if }y^x_{l,i}<0\\0&\text{if }y^x_{l,i} = 0\end{cases} \] - **Neuron entropy**: \[ H_{l,i}=-\sum_{s_{l,i}=\pm1}p(s_{l,i})\log_2[p(s_{l,i})] \] - **Layer average entropy**: \[ \bar{H}_l=\frac{1}{N_l}\sum_iH_{l,i} \] - **Weight pruning fraction**: \[ I_l=\frac{1}{N_l}\sum_{i = 1}^{N_l}\bar{H}_{l,i}\cdot\frac{1}{\|w_{l,i}\|_0|w_{l,i}|} \] - **Inter - layer pruning correlation fraction**: \[ R_l=\begin{cases}\frac{\sum_{j\in L}I_j}{I_l}&\text{if }I_l\neq0\\0&\text{otherwise}\end{cases} \] - **Entropy - weighted pruning parameter budget**: \[ \|w_l\|_{pruned}^0=\|w\|_{pruned}^0\cdot\frac{\exp[R_l]}{\sum_j\exp[R(j)]} \] Through these methods, NEPENTHE can...

NEPENTHE: Entropy-Based Pruning as a Neural Network Depth's Reducer

Class-Aware Pruning for Efficient Neural Networks

Loss Constrains Added Squeeze and Excitation Blocks for Pruning Deep Neural Networks

The Simpler The Better: An Entropy-Based Importance Metric To Reduce Neural Networks' Depth

Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures

Exploring The Neural Burden In Pruned Models: An Insight Inspired By Neuroscience

Network Automatic Pruning: Start NAP and Take a Nap

How Sparse Can We Prune A Deep Network: A Fundamental Limit Viewpoint

Small Contributions, Small Networks: Efficient Neural Network Pruning Based on Relative Importance

Neural Network Pruning by Gradient Descent

Pruning the Deep Neural Network by Similar Function

Differential Evolution Based Layer-Wise Weight Pruning for Compressing Deep Neural Networks

Connectivity Matters: Neural Network Pruning Through the Lens of Effective Sparsity

Knapsack Pruning with Inner Distillation

A Dynamic Pruning Method on Multiple Sparse Structures in Deep Neural Networks

Is Complexity Required for Neural Network Pruning? A Case Study on Global Magnitude Pruning

Structural Pruning in Deep Neural Networks: A Small-World Approach

Global Sparse Momentum SGD for Pruning Very Deep Neural Networks

Quantisation and Pruning for Neural Network Compression and Regularisation

Network Compression Via Recursive Bayesian Pruning.

Discrimination-aware Network Pruning for Deep Model Compression