Abstract:Neural networks with wide layers have attracted significant attention due to their equivalence to Gaussian processes, enabling perfect fitting of training data while maintaining generalization performance, known as benign overfitting. However, existing results mainly focus on shallow or finite-depth networks, necessitating a comprehensive analysis of wide neural networks with infinite-depth layers, such as neural ordinary differential equations (ODEs) and deep equilibrium models (DEQs). In this paper, we specifically investigate the deep equilibrium model (DEQ), an infinite-depth neural network with shared weight matrices across layers. Our analysis reveals that as the width of DEQ layers approaches infinity, it converges to a Gaussian process, establishing what is known as the Neural Network and Gaussian Process (NNGP) correspondence. Remarkably, this convergence holds even when the limits of depth and width are interchanged, which is not observed in typical infinite-depth Multilayer Perceptron (MLP) networks. Furthermore, we demonstrate that the associated Gaussian vector remains non-degenerate for any pairwise distinct input data, ensuring a strictly positive smallest eigenvalue of the corresponding kernel matrix using the NNGP kernel. These findings serve as fundamental elements for studying the training and generalization of DEQs, laying the groundwork for future research in this area.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is whether infinite - depth neural networks (such as the Deep Equilibrium Model (DEQ)) converge to a Gaussian Process (GP) when the width approaches infinity, and how this convergence behavior differs from that of finite - depth networks. Specifically, the authors focus on: 1. **Exchangeability of infinite depth and width**: Existing research mainly focuses on finite - depth neural networks. When the network depth and width both tend to infinity, can the limit order be exchanged? This is crucial for understanding the behavior of infinite - depth neural networks. 2. **Gaussian process characteristics of the Deep Equilibrium Model (DEQ)**: DEQ is an infinite - depth neural network architecture with a shared weight matrix. The authors hope to prove that when the layer width tends to infinity, DEQ will converge to a Gaussian process, and this conclusion still holds when the depth and width limits are interchanged. 3. **Positive definiteness of the covariance function**: To ensure the validity and stability of the Gaussian process, the authors need to prove that the corresponding covariance function remains strictly positive definite between any different pairs of input data. This involves that the minimum eigenvalue of the kernel matrix must be positive. ### Main contributions - **Established the exchangeability of depth and width limits**: Through detailed analysis, the authors proved that for DEQ, the limits of depth and width can be interchanged, that is, regardless of whether the depth limit or the width limit is taken first, the final result is the same. - **Proved the strict positive definiteness of the covariance function**: The authors showed that when the activation function is non - polynomial, the corresponding covariance function is strictly positive definite on the unit sphere. This means that the kernel matrix is non - degenerate, thereby ensuring the stability and validity of the Gaussian process. - **Provided a theoretical basis**: These findings lay a theoretical foundation for studying the training and generalization performance of DEQ, further promoting the understanding and application of infinite - depth neural networks. ### Formula summary - Recursive definition of the covariance function: \[ \begin{aligned} &\Sigma_1(x, x')=\sigma_u^2\langle x, x'\rangle / n_{in}, \\ &\Sigma_{\ell + 1}(x, x')=\sigma_w^2E[\phi(u_\ell(x))\phi(u_\ell(x'))], \end{aligned} \] where \((u_\ell(x), u_\ell(x'))\) follows a bivariate Gaussian distribution with a mean of zero and a covariance of \(\text{Cov}(u_\ell(x), u_\ell(x'))=\begin{cases} \Sigma_1(x, x'), &\ell = 1 \\ \Sigma_\ell(x, x')+\Sigma_1(x, x'), &\ell\in[2, L - 1] \end{cases}\) - Limit covariance function: \[ \Sigma^*(x, x')=\lim_{\ell\rightarrow\infty}\Sigma_\ell(x, x'). \] These formulas reveal the behavior of DEQ when the width tends to infinity and provide important mathematical tools for subsequent research.

Wide Neural Networks as Gaussian Processes: Lessons from Deep Equilibrium Models

Neural Network Gaussian Processes by Increasing Depth

Deep Neural Networks as Gaussian Processes

Wide Neural Networks with Bottlenecks are Deep Gaussian Processes

Deep Equilibrium Models are Almost Equivalent to Not-so-deep Explicit Models for High-dimensional Gaussian Mixtures

Deep neural networks with dependent weights: Gaussian Process mixture limit, heavy tails, sparsity and compressibility

Wide Deep Neural Networks with Gaussian Weights are Very Close to Gaussian Processes

Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training

Deep quantum neural networks form Gaussian processes

An Infinite-Width Analysis on the Jacobian-Regularised Training of a Neural Network

GEQ: Gaussian Kernel Inspired Equilibrium Models

Gaussian Universality in Neural Network Dynamics with Generalized Structured Input Distributions

Neural Ordinary Differential Equations with Envolutionary Weights

Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection

Deep Kernel Posterior Learning under Infinite Variance Prior Weights

Random ReLU Neural Networks as Non-Gaussian Processes

Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes

Deeper or Wider: A Perspective from Optimal Generalization Error with Sobolev Loss

Deep Q-Exponential Processes

Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations

Scalable Gaussian Process Regression Using Deep Neural Networks.