Lottery Tickets with Nonzero Biases

Jonas Fischer,Advait Gadhikar,Rebekka Burkholz
DOI: https://doi.org/10.48550/arXiv.2110.11150
2022-06-07
Abstract:The strong lottery ticket hypothesis holds the promise that pruning randomly initialized deep neural networks could offer a computationally efficient alternative to deep learning with stochastic gradient descent. Common parameter initialization schemes and existence proofs, however, are focused on networks with zero biases, thus foregoing the potential universal approximation property of pruning. To fill this gap, we extend multiple initialization schemes and existence proofs to nonzero biases, including explicit 'looks-linear' approaches for ReLU activation functions. These do not only enable truly orthogonal parameter initialization but also reduce potential pruning errors. In experiments on standard benchmark data, we further highlight the practical benefits of nonzero bias initialization schemes, and present theoretically inspired extensions for state-of-the-art strong lottery ticket pruning.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper is mainly dedicated to solving the problem regarding non - zero - bias initialization in the Strong Lottery Ticket Hypothesis (SLTH). Specifically, it aims to: 1. **Demonstrate the limitations of zero - bias initialization**: - The paper points out that most of the existing proofs of the Strong Lottery Ticket Hypothesis and initialization schemes focus on zero - bias networks, which makes these networks unable to achieve the Universal Approximation Property. That is, neural networks without bias cannot approximate arbitrary continuous functions well in some cases. 2. **Extend the Strong Lottery Ticket Hypothesis to non - zero - bias networks**: - To fill this gap, the paper proposes a new initialization scheme that extends common initialization methods to non - zero - bias and proves that the Strong Lottery Ticket Hypothesis still holds in this setting. This means that through appropriate initialization and pruning algorithms, strong lottery sub - networks (Strong Lottery Tickets) containing non - zero - bias can be found, thus achieving the Universal Approximation Property. 3. **Improve the pruning algorithm to discover strong lotteries containing bias**: - Existing pruning algorithms such as Edge - popup cannot restore bias parameters and can only find relatively dense lottery sub - networks. For this reason, the paper proposes a series of improvement measures, including extending the popup score to bias terms and gradually adjusting the sparsity of the network to find more sparse and high - quality strong lottery sub - networks. 4. **Verify the practical feasibility of theoretical results**: - The paper verifies the effectiveness of the newly proposed non - zero - bias initialization scheme and pruning algorithm through experiments, indicating that these methods can indeed find highly sparse strong lottery sub - networks with excellent performance in practical applications. ### Formula summary - **ReLU network output formula**: \[ f(x)=\phi(h),\quad h = Wx + b \] where \(\phi(x)=\max(0,x)\) is the ReLU activation function, \(W\) is the weight matrix, and \(b\) is the bias vector. - **Error propagation in the Strong Lottery Ticket Hypothesis**: \[ \epsilon_l=\epsilon\left(L\sqrt{n_lk_{l,\text{max}}\left(1 + \sup_{x\in[- 1,1]^{n_0}}\|x^{(l)}\|_1\right)}\right)^{-1}\left(\prod_{k = l + 1}(\|W^{(l)}\|_\infty+\epsilon/L)\right)^{-1} \] - **Output scaling factor**: \[ \lambda=\prod_{l = 1}^L\sigma_w^{-1} \] ### Conclusion By introducing non - zero - bias initialization and improving the pruning algorithm, this paper provides broader applicability and stronger theoretical support for the Strong Lottery Ticket Hypothesis, and also provides new ideas and methods for efficient model pruning in practical applications.