Abstract:Modern neural networks (NN) featuring a large number of layers (depth) and units per layer (width) have achieved a remarkable performance across many domains. While there exists a vast literature on the interplay between infinitely wide NNs and Gaussian processes, a little is known about analogous interplays with respect to infinitely deep NNs. NNs with independent and identically distributed (i.i.d.) initializations exhibit undesirable forward and backward propagation properties as the number of layers increases. To overcome these drawbacks, Peluchetti and Favaro (2020) considered fully-connected residual networks (ResNets) with network's parameters initialized by means of distributions that shrink as the number of layers increases, thus establishing an interplay between infinitely deep ResNets and solutions to stochastic differential equations, i.e. diffusion processes, and showing that infinitely deep ResNets does not suffer from undesirable forward-propagation properties. In this paper, we review the results of Peluchetti and Favaro (2020), extending them to convolutional ResNets, and we establish analogous backward-propagation results, which directly relate to the problem of training fully-connected deep ResNets. Then, we investigate the more general setting of doubly infinite NNs, where both network's width and network's depth grow unboundedly. We focus on doubly infinite fully-connected ResNets, for which we consider i.i.d. initializations. Under this setting, we show that the dynamics of quantities of interest converge, at initialization, to deterministic limits. This allow us to provide analytical expressions for inference, both in the case of weakly trained and fully trained ResNets. Our results highlight a limited expressive power of doubly infinite ResNets when the unscaled network's parameters are i.i.d. and the residual blocks are shallow.

Doubly infinite residual neural networks: a diffusion process approach

Stochastic Neural Networks with Infinite Width are Deterministic

Infinite‐width limit of deep linear neural networks

Infinitely wide limits for deep Stable neural networks: sub-linear, linear and super-linear activation functions

Scaling ResNets in the Large-depth Regime

Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations

Diffusion Mechanism in Residual Neural Network: Theory and Applications

Spiking Deep Residual Networks.

Deep Limits of Residual Neural Networks

Residual Connections Encourage Iterative Inference

Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport

Proportional infinite-width infinite-depth limit for deep linear neural networks

Field theory for optimal signal propagation in ResNets

Training Recurrent Neural Networks by Diffusion

Independently Recurrent Neural Network (indrnn): Building a Longer and Deeper RNN.

Wide Neural Networks as Gaussian Processes: Lessons from Deep Equilibrium Models

Constructing Infinite Deep Neural Networks with Flexible Expressiveness While Training

Infinite-dimensional Folded-in-time Deep Neural Networks

Implicit regularization of deep residual networks towards neural ODEs

Do Residual Neural Networks discretize Neural Ordinary Differential Equations?

Peeking Behind the Curtains of Residual Learning