Abstract:We consider a large class of shallow neural networks with randomly initialized parameters and rectified linear unit activation functions. We prove that these random neural networks are well-defined non-Gaussian processes. As a by-product, we demonstrate that these networks are solutions to stochastic differential equations driven by impulsive white noise (combinations of random Dirac measures). These processes are parameterized by the law of the weights and biases as well as the density of activation thresholds in each bounded region of the input domain. We prove that these processes are isotropic and wide-sense self-similar with Hurst exponent $3/2$. We also derive a remarkably simple closed-form expression for their autocovariance function. Our results are fundamentally different from prior work in that we consider a non-asymptotic viewpoint: The number of neurons in each bounded region of the input domain (i.e., the width) is itself a random variable with a Poisson law with mean proportional to the density parameter. Finally, we show that, under suitable hypotheses, as the expected width tends to infinity, these processes can converge in law not only to Gaussian processes, but also to non-Gaussian processes depending on the law of the weights. Our asymptotic results provide a new take on several classical results (wide networks converge to Gaussian processes) as well as some new ones (wide networks can converge to non-Gaussian processes).

What problem does this paper attempt to address?

The problem that this paper attempts to solve is about the properties of shallow ReLU neural networks with randomly initialized parameters. Specifically, the author studies the characteristics of these random ReLU neural networks as non - Gaussian processes. The main problems can be summarized as the following aspects: 1. **Define random ReLU neural networks**: - The paper first defines a class of shallow ReLU neural networks with randomly initialized parameters and formalizes it as a Poisson - type random function. Such a network can be represented as: \[ s_{\text{ReLU}}(x)=\sum_{k \in \mathbb{Z}} v_k\left[\text{ReLU}(w_k^T x - b_k)+c_k^T x + c_{0,k}\right],\quad x\in \mathbb{R}^d, \] where $\text{ReLU}(t)=\max\{0,t\}$, $v_k$ are independent and identically distributed (i.i.d.) random variables, $(w_k, b_k)$ are random variables satisfying specific conditions, and $c_k$ and $c_{0,k}$ are correction terms to ensure that the sum converges almost everywhere. 2. **Prove that random ReLU neural networks are non - Gaussian processes**: - The author proves that these random ReLU neural networks are well - defined non - Gaussian processes. This involves proving that these networks are solutions to certain stochastic differential equations driven by impulsive white noise. The parameters of these processes include the distributions of weights and biases and the density of activation thresholds within each bounded region. 3. **Study the statistical properties of the processes**: - The author derives the first - order and second - order statistics of these random ReLU neural networks, especially their auto - covariance functions. They find a very simple closed - form expression: \[ C_{s_{\text{ReLU}}}(x,y)=A\lambda E[V^2]\left(\|x - y\|^{3/2}-\|x\|^{3/2}-\|y\|^{3/2}+3x^T y(\|x\|^2+\|y\|^2)\right), \] where $A = \frac{\Gamma(-3/2)}{2^{d + 3}\pi^{d/2}\Gamma((d + 3)/2)}$, and $\Gamma(\cdot)$ is the Euler gamma function. 4. **Study the asymptotic behavior of the processes**: - The author studies the limit behavior of these random ReLU neural networks when the network width tends to infinity. They prove that under certain conditions, these processes can converge to Gaussian processes or non - Gaussian processes. Specifically: - When the distribution of weights is Gaussian and the variance is inversely proportional to $\lambda$, the process converges to a Gaussian process. - When the distribution of weights is a symmetric $\alpha$-stable distribution and the scale parameter is proportional to $\lambda^{-1/\alpha}$, the process converges to a non - Gaussian process. Through these studies, the author provides a new perspective to understand the limit behavior of wide networks, including not only the classical Gaussian process results but also some new non - Gaussian process results. This work is of great significance for understanding the statistical characteristics of randomly initialized neural networks and their behavior under infinite width.

Random ReLU Neural Networks as Non-Gaussian Processes

Stochastic Neural Networks with Infinite Width are Deterministic

Wide Deep Neural Networks with Gaussian Weights are Very Close to Gaussian Processes

Deep quantum neural networks form Gaussian processes

Deep Kernel Posterior Learning under Infinite Variance Prior Weights

Deep Neural Networks as Gaussian Processes

Proportional infinite-width infinite-depth limit for deep linear neural networks

Quantitative CLTs in Deep Neural Networks

Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection

Normal approximation of Random Gaussian Neural Networks

Wide Neural Networks as Gaussian Processes: Lessons from Deep Equilibrium Models

Large Deviations of Gaussian Neural Networks with ReLU activation

Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training

Neural Network Gaussian Processes by Increasing Depth

On Random Matrices Arising in Deep Neural Networks: General I.I.D. Case

Posterior Inference on Shallow Infinitely Wide Bayesian Neural Networks under Weights with Unbounded Variance

A Unified Theory of Quantum Neural Network Loss Landscapes

Deep neural networks with dependent weights: Gaussian Process mixture limit, heavy tails, sparsity and compressibility

Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes

Non-asymptotic approximations of Gaussian neural networks via second-order Poincaré inequalities

Invariance of Weight Distributions in Rectified MLPs