Abstract:The choice of architecture of a neural network influences which functions will be realizable by that neural network and, as a result, studying the expressiveness of a chosen architecture has received much attention. In ReLU neural networks, the presence of stably unactivated neurons can reduce the network's expressiveness. In this work, we investigate the probability of a neuron in the second hidden layer of such neural networks being stably unactivated when the weights and biases are initialized from symmetric probability distributions. For networks with input dimension $n_0$, we prove that if the first hidden layer has $n_0+1$ neurons then this probability is exactly $\frac{2^{n_0}+1}{4^{n_0+1}}$, and if the first hidden layer has $n_1$ neurons, $n_1 \le n_0$, then the probability is $\frac{1}{2^{n_1+1}}$. Finally, for the case when the first hidden layer has more neurons than $n_0+1$, a conjecture is proposed along with the rationale. Computational evidence is presented to support the conjecture.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper mainly studies the influence of stably unactivated neurons in ReLU neural networks on the expressiveness of the network. Specifically, the authors focus on the probability of neurons in the second hidden layer becoming stably unactivated neurons when weights and biases are initialized from a symmetric probability distribution in ReLU neural networks. #### Main problems 1. **Definition and influence of stably unactivated neurons**: - Stably unactivated neurons refer to those that are always not activated (output is 0) regardless of how the input data changes within an open neighborhood in the network parameter space. This phenomenon will affect the expressiveness of the network, that is, the range of functions that the network can achieve. 2. **Probability calculation**: - The main goal of the paper is to calculate the probability of neurons in the second hidden layer becoming stably unactivated neurons under a given network architecture. Specifically: - If the first hidden layer has $n_0 + 1$ neurons, then the probability is $\frac{2n_0 + 1}{4n_0 + 1}$. - If the first hidden layer has $n_1$ neurons ($n_1\leq n_0$), then the probability is $\frac{1}{2^{n_1 + 1}}$. - For the case where the first hidden layer has more neurons, a conjecture is proposed and computational evidence is provided to support it. 3. **Functional dimension and redundancy**: - By studying the probability of stably unactivated neurons, the paper further explores the functional dimension of the parameter space, that is, the degrees of freedom of the network under small perturbations. The existence of stably unactivated neurons will limit the functional dimension of the network, thus affecting the training effect. #### Formula summary - When the first hidden layer has $n_0 + 1$ neurons, the probability of stably unactivated neurons is: \[ P=\frac{2n_0 + 1}{4n_0 + 1} \] - When the first hidden layer has $n_1$ neurons ($n_1\leq n_0$), the probability of stably unactivated neurons is: \[ P = \frac{1}{2^{n_1 + 1}} \] #### Conclusion Through these studies, the authors hope to reveal the influence of the existence of stably unactivated neurons in ReLU neural networks on network performance and provide a theoretical basis for further optimizing network architectures and initialization strategies.

Stably unactivated neurons in ReLU neural networks

Neural networks with ReLU powers need less depth

Compelling ReLU Networks to Exhibit Exponentially Many Linear Regions at Initialization and During Training

On the Principles of ReLU Networks with One Hidden Layer

Agnostic Learning of Arbitrary ReLU Activation under Gaussian Marginals

Learning Narrow One-Hidden-Layer ReLU Networks

ReLUs Are Sufficient for Learning Implicit Neural Representations

Towards Understanding the Condensation of Neural Networks at Initial Training

How Implicit Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part I: the 1-D Case of Two Layers with Random First Layer

ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models

Principles for Initialization and Architecture Selection in Graph Neural Networks with ReLU Activations

Nonparametric regression using deep neural networks with ReLU activation function

1-Lipschitz Neural Networks are more expressive with N-Activations

Phase Diagram for Two-layer ReLU Neural Networks at Infinite-width Limit.

Hidden Unit Specialization in Layered Neural Networks: ReLU vs. Sigmoidal Activation

When Are Bias-Free ReLU Networks Like Linear Networks?

Most Activation Functions Can Win the Lottery Without Excessive Depth

On the Expressive Power of Neural Networks

Towards Lower Bounds on the Depth of ReLU Neural Networks

Early Neuron Alignment in Two-layer ReLU Networks with Small Initialization

Singular Values for ReLU Layers