Abstract:The strong lottery ticket hypothesis holds the promise that pruning randomly initialized deep neural networks could offer a computationally efficient alternative to deep learning with stochastic gradient descent. Common parameter initialization schemes and existence proofs, however, are focused on networks with zero biases, thus foregoing the potential universal approximation property of pruning. To fill this gap, we extend multiple initialization schemes and existence proofs to nonzero biases, including explicit 'looks-linear' approaches for ReLU activation functions. These do not only enable truly orthogonal parameter initialization but also reduce potential pruning errors. In experiments on standard benchmark data, we further highlight the practical benefits of nonzero bias initialization schemes, and present theoretically inspired extensions for state-of-the-art strong lottery ticket pruning.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper is mainly dedicated to solving the problem regarding non - zero - bias initialization in the Strong Lottery Ticket Hypothesis (SLTH). Specifically, it aims to: 1. **Demonstrate the limitations of zero - bias initialization**: - The paper points out that most of the existing proofs of the Strong Lottery Ticket Hypothesis and initialization schemes focus on zero - bias networks, which makes these networks unable to achieve the Universal Approximation Property. That is, neural networks without bias cannot approximate arbitrary continuous functions well in some cases. 2. **Extend the Strong Lottery Ticket Hypothesis to non - zero - bias networks**: - To fill this gap, the paper proposes a new initialization scheme that extends common initialization methods to non - zero - bias and proves that the Strong Lottery Ticket Hypothesis still holds in this setting. This means that through appropriate initialization and pruning algorithms, strong lottery sub - networks (Strong Lottery Tickets) containing non - zero - bias can be found, thus achieving the Universal Approximation Property. 3. **Improve the pruning algorithm to discover strong lotteries containing bias**: - Existing pruning algorithms such as Edge - popup cannot restore bias parameters and can only find relatively dense lottery sub - networks. For this reason, the paper proposes a series of improvement measures, including extending the popup score to bias terms and gradually adjusting the sparsity of the network to find more sparse and high - quality strong lottery sub - networks. 4. **Verify the practical feasibility of theoretical results**: - The paper verifies the effectiveness of the newly proposed non - zero - bias initialization scheme and pruning algorithm through experiments, indicating that these methods can indeed find highly sparse strong lottery sub - networks with excellent performance in practical applications. ### Formula summary - **ReLU network output formula**: \[ f(x)=\phi(h),\quad h = Wx + b \] where \(\phi(x)=\max(0,x)\) is the ReLU activation function, \(W\) is the weight matrix, and \(b\) is the bias vector. - **Error propagation in the Strong Lottery Ticket Hypothesis**: \[ \epsilon_l=\epsilon\left(L\sqrt{n_lk_{l,\text{max}}\left(1 + \sup_{x\in[- 1,1]^{n_0}}\|x^{(l)}\|_1\right)}\right)^{-1}\left(\prod_{k = l + 1}(\|W^{(l)}\|_\infty+\epsilon/L)\right)^{-1} \] - **Output scaling factor**: \[ \lambda=\prod_{l = 1}^L\sigma_w^{-1} \] ### Conclusion By introducing non - zero - bias initialization and improving the pruning algorithm, this paper provides broader applicability and stronger theoretical support for the Strong Lottery Ticket Hypothesis, and also provides new ideas and methods for efficient model pruning in practical applications.

Lottery Tickets with Nonzero Biases

Insights into the Lottery Ticket Hypothesis and Iterative Magnitude Pruning

When Layers Play the Lottery, all Tickets Win at Initialization

Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP

On the Existence of Universal Lottery Tickets

Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks

Most Activation Functions Can Win the Lottery Without Excessive Depth

No Free Prune: Information-Theoretic Barriers to Pruning at Initialization

Plant 'n' Seek: Can You Find the Winning Ticket?

Winning Lottery Tickets in Deep Generative Models

Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets

Randomly Initialized Subnetworks with Iterative Weight Recycling

Probabilistic Modeling: Proving the Lottery Ticket Hypothesis in Spiking Neural Network

Juvenile state hypothesis: What we can learn from lottery ticket hypothesis researches?

Dual Lottery Ticket Hypothesis

Rethinking Graph Lottery Tickets: Graph Sparsity Matters

Rare Gems: Finding Lottery Tickets at Initialization

One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers

Tilting the Odds at the Lottery: the Interplay of Overparameterisation and Curricula in Neural Networks

Exploring the Performance of Pruning Methods in Neural Networks: An Empirical Study of the Lottery Ticket Hypothesis

Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective