Large and moderate deviations for Gaussian neural networks

Claudio Macci,Barbara Pacchiarotti,Giovanni Luca Torrisi
2024-06-24
Abstract:We prove large and moderate deviations for the output of Gaussian fully connected neural networks. The main achievements concern deep neural networks (i.e., when the model has more than one hidden layer) and hold for bounded and continuous pre-activation functions. However, for deep neural networks fed by a single input, we have results even if the pre-activation is ReLU. When the network is shallow (i.e., there is exactly one hidden layer) the large and moderate principles hold for quite general pre-activations and in an infinite-dimensional setting.
Probability
What problem does this paper attempt to address?
The paper mainly discusses the behavior of Gaussian fully connected neural networks under large and moderate biases. The study focuses on deep neural networks (with more than one hidden layer) and shallow neural networks (with only one hidden layer), assuming that the network parameters follow a Gaussian distribution. The authors prove that under specific normalization sequences, the outputs of these networks obey the large and moderate deviation principles, that is, they estimate the probabilities of rare events occurring on an exponential scale in both large and moderate sample sizes. Specifically, when the network has a single input and ReLU activation function, they also provide the large and moderate deviation principles. The main contributions of the paper include: 1. Establishing the large deviation principle for the output sequences of deep Gaussian fully connected neural networks, with a rate of v(n), and providing the corresponding rate function IZ(L+1)(x). 2. When positive sequences an satisfy certain conditions, the paper proves the moderate deviation principle for the output sequences when an tends to 0 and anv(n) tends to infinity, with a rate of 1/an and a rate function of approximately IZ(L+1)(x). 3. In specific cases where the network has a single input and ReLU activation function, the authors also provide the large and moderate deviation principles. These results fill the theoretical gap in understanding the behavior of deep neural networks in the early stages of training, especially the atypical behavior of network outputs when parameters are randomly initialized. Moreover, these results are significant for the design of optimization algorithms and the improvement of neural network efficiency.