Stably unactivated neurons in ReLU neural networks

Natalie Brownlowe,Christopher R. Cornwell,Ethan Montes,Gabriel Quijano,Na Zhang
2024-12-07
Abstract:The choice of architecture of a neural network influences which functions will be realizable by that neural network and, as a result, studying the expressiveness of a chosen architecture has received much attention. In ReLU neural networks, the presence of stably unactivated neurons can reduce the network's expressiveness. In this work, we investigate the probability of a neuron in the second hidden layer of such neural networks being stably unactivated when the weights and biases are initialized from symmetric probability distributions. For networks with input dimension $n_0$, we prove that if the first hidden layer has $n_0+1$ neurons then this probability is exactly $\frac{2^{n_0}+1}{4^{n_0+1}}$, and if the first hidden layer has $n_1$ neurons, $n_1 \le n_0$, then the probability is $\frac{1}{2^{n_1+1}}$. Finally, for the case when the first hidden layer has more neurons than $n_0+1$, a conjecture is proposed along with the rationale. Computational evidence is presented to support the conjecture.
Machine Learning,Probability
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper mainly studies the influence of stably unactivated neurons in ReLU neural networks on the expressiveness of the network. Specifically, the authors focus on the probability of neurons in the second hidden layer becoming stably unactivated neurons when weights and biases are initialized from a symmetric probability distribution in ReLU neural networks. #### Main problems 1. **Definition and influence of stably unactivated neurons**: - Stably unactivated neurons refer to those that are always not activated (output is 0) regardless of how the input data changes within an open neighborhood in the network parameter space. This phenomenon will affect the expressiveness of the network, that is, the range of functions that the network can achieve. 2. **Probability calculation**: - The main goal of the paper is to calculate the probability of neurons in the second hidden layer becoming stably unactivated neurons under a given network architecture. Specifically: - If the first hidden layer has \(n_0 + 1\) neurons, then the probability is \(\frac{2n_0 + 1}{4n_0 + 1}\). - If the first hidden layer has \(n_1\) neurons (\(n_1\leq n_0\)), then the probability is \(\frac{1}{2^{n_1 + 1}}\). - For the case where the first hidden layer has more neurons, a conjecture is proposed and computational evidence is provided to support it. 3. **Functional dimension and redundancy**: - By studying the probability of stably unactivated neurons, the paper further explores the functional dimension of the parameter space, that is, the degrees of freedom of the network under small perturbations. The existence of stably unactivated neurons will limit the functional dimension of the network, thus affecting the training effect. #### Formula summary - When the first hidden layer has \(n_0 + 1\) neurons, the probability of stably unactivated neurons is: \[ P=\frac{2n_0 + 1}{4n_0 + 1} \] - When the first hidden layer has \(n_1\) neurons (\(n_1\leq n_0\)), the probability of stably unactivated neurons is: \[ P = \frac{1}{2^{n_1 + 1}} \] #### Conclusion Through these studies, the authors hope to reveal the influence of the existence of stably unactivated neurons in ReLU neural networks on network performance and provide a theoretical basis for further optimizing network architectures and initialization strategies.