What problem does this paper attempt to address?

This paper mainly investigates the phenomenon of large bias in Gaussian neural networks with ReLU activation functions. Quirin Vogel, the author, proves the theorem that deep neural networks exhibit large bias behavior with a large number of parameters under activation functions allowing linear growth, such as ReLU. The paper extends previous work that only considered bounded and continuous activation functions and simplifies the expression of the rate function given before. The paper first defines the model, where the neural network consists of multiple layers, weights, and biases, and uses activation functions with linear growth, such as ReLU. The author assumes that the weights and biases follow Gaussian distributions and introduces the method of transforming from "conditional large bias principle" to "global large bias principle." They point out that ReLU and other common activation functions satisfy these assumptions. The main contributions of the paper include: 1. Proving that under ReLU activation, the random vector output of a neural network satisfies the principle of large bias, quantifying its anomalous behavior even in high-dimensional cases. 2. Providing a simplified form of the rate function, reducing the complexity of the optimization problem, and specifically giving a power series expansion in the case of ReLU activation function, which helps approximate calculations in high dimensions. 3. Discussing the key differences between linear growth activation functions and super-linear or sub-linear growth functions, where the large bias principle no longer belongs to the exponential class for faster-growing functions. The paper also discusses relevant theoretical tools, such as Gaussian processes and convex analysis, to address the continuity issues of non-trivial gradients and conditional large bias principles. In addition, the author points out that these results apply to the neural network architecture before training, i.e., the case of randomly initialized weights and biases. In summary, this paper provides a theoretical basis for understanding the statistical behavior of deep learning models with a large number of parameters under ReLU activation by deepening the study of Gaussian neural networks with ReLU activation. It is of significant importance for understanding and optimizing the training process of deep learning models.

Large Deviations of Gaussian Neural Networks with ReLU activation

Large and moderate deviations for Gaussian neural networks

Large deviations of one-hidden-layer neural networks

Random ReLU Neural Networks as Non-Gaussian Processes

Large deviation analysis of function sensitivity in random deep neural networks

Large Deviations for High Minima of Gaussian Processes with Nonnegatively Correlated Increments

ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models

Agnostic Learning of Arbitrary ReLU Activation under Gaussian Marginals

Neural networks with ReLU powers need less depth

Singular Values for ReLU Layers

Infinitely wide limits for deep Stable neural networks: sub-linear, linear and super-linear activation functions

Exponential Expressivity of ReLU$^k$ Neural Networks on Gevrey Classes with Point Singularities

Compelling ReLU Networks to Exhibit Exponentially Many Linear Regions at Initialization and During Training

Depth Degeneracy in Neural Networks: Vanishing Angles in Fully Connected ReLU Networks on Initialization

Convergence Analysis of Two-layer Neural Networks with ReLU Activation

Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions

Classification of Data Generated by Gaussian Mixture Models Using Deep ReLU Networks

Gaussian Error Linear Units (GELUs)

Neural network integral representations with the ReLU activation function

Learning the gravitational force law and other analytic functions