Large and moderate deviations for Gaussian neural networks

Claudio Macci,Barbara Pacchiarotti,Giovanni Luca Torrisi

2024-06-24

Abstract:We prove large and moderate deviations for the output of Gaussian fully connected neural networks. The main achievements concern deep neural networks (i.e., when the model has more than one hidden layer) and hold for bounded and continuous pre-activation functions. However, for deep neural networks fed by a single input, we have results even if the pre-activation is ReLU. When the network is shallow (i.e., there is exactly one hidden layer) the large and moderate principles hold for quite general pre-activations and in an infinite-dimensional setting.

Probability

What problem does this paper attempt to address?

The paper mainly discusses the behavior of Gaussian fully connected neural networks under large and moderate biases. The study focuses on deep neural networks (with more than one hidden layer) and shallow neural networks (with only one hidden layer), assuming that the network parameters follow a Gaussian distribution. The authors prove that under specific normalization sequences, the outputs of these networks obey the large and moderate deviation principles, that is, they estimate the probabilities of rare events occurring on an exponential scale in both large and moderate sample sizes. Specifically, when the network has a single input and ReLU activation function, they also provide the large and moderate deviation principles. The main contributions of the paper include: 1. Establishing the large deviation principle for the output sequences of deep Gaussian fully connected neural networks, with a rate of v(n), and providing the corresponding rate function IZ(L+1)(x). 2. When positive sequences an satisfy certain conditions, the paper proves the moderate deviation principle for the output sequences when an tends to 0 and anv(n) tends to infinity, with a rate of 1/an and a rate function of approximately IZ(L+1)(x). 3. In specific cases where the network has a single input and ReLU activation function, the authors also provide the large and moderate deviation principles. These results fill the theoretical gap in understanding the behavior of deep neural networks in the early stages of training, especially the atypical behavior of network outputs when parameters are randomly initialized. Moreover, these results are significant for the design of optimization algorithms and the improvement of neural network efficiency.

Large and moderate deviations for Gaussian neural networks

Large Deviations of Gaussian Neural Networks with ReLU activation

Quantitative CLTs in Deep Neural Networks

Normal approximation of Random Gaussian Neural Networks

Large deviations of one-hidden-layer neural networks

Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training

Random ReLU Neural Networks as Non-Gaussian Processes

Proportional infinite-width infinite-depth limit for deep linear neural networks

Large Deviations for High Minima of Gaussian Processes with Nonnegatively Correlated Increments

Infinitely wide limits for deep Stable neural networks: sub-linear, linear and super-linear activation functions

Gaussian Universality in Neural Network Dynamics with Generalized Structured Input Distributions

Large deviations for conditionally Gaussian processes: estimates of level crossing probability

Wide Deep Neural Networks with Gaussian Weights are Very Close to Gaussian Processes

Large deviation analysis of function sensitivity in random deep neural networks

Central Limit Theorem for Bayesian Neural Network trained with Variational Inference

Gaussian Pre-Activations in Neural Networks: Myth or Reality?

A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise linear target functions

Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods

Deep Neural Networks as Gaussian Processes

Scalable Bayesian Inference in the Era of Deep Learning: From Gaussian Processes to Deep Neural Networks

Wide Neural Networks with Bottlenecks are Deep Gaussian Processes