Abstract:Neural networks with wide layers have attracted significant attention due to their equivalence to Gaussian processes, enabling perfect fitting of training data while maintaining generalization performance, known as benign overfitting. However, existing results mainly focus on shallow or finite-depth networks, necessitating a comprehensive analysis of wide neural networks with infinite-depth layers, such as neural ordinary differential equations (ODEs) and deep equilibrium models (DEQs). In this paper, we specifically investigate the deep equilibrium model (DEQ), an infinite-depth neural network with shared weight matrices across layers. Our analysis reveals that as the width of DEQ layers approaches infinity, it converges to a Gaussian process, establishing what is known as the Neural Network and Gaussian Process (NNGP) correspondence. Remarkably, this convergence holds even when the limits of depth and width are interchanged, which is not observed in typical infinite-depth Multilayer Perceptron (MLP) networks. Furthermore, we demonstrate that the associated Gaussian vector remains non-degenerate for any pairwise distinct input data, ensuring a strictly positive smallest eigenvalue of the corresponding kernel matrix using the NNGP kernel. These findings serve as fundamental elements for studying the training and generalization of DEQs, laying the groundwork for future research in this area.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is whether infinite - depth neural networks (such as the Deep Equilibrium Model (DEQ)) converge to a Gaussian Process (GP) when the width approaches infinity, and how this convergence behavior differs from that of finite - depth networks. Specifically, the authors focus on:
1. **Exchangeability of infinite depth and width**: Existing research mainly focuses on finite - depth neural networks. When the network depth and width both tend to infinity, can the limit order be exchanged? This is crucial for understanding the behavior of infinite - depth neural networks.
2. **Gaussian process characteristics of the Deep Equilibrium Model (DEQ)**: DEQ is an infinite - depth neural network architecture with a shared weight matrix. The authors hope to prove that when the layer width tends to infinity, DEQ will converge to a Gaussian process, and this conclusion still holds when the depth and width limits are interchanged.
3. **Positive definiteness of the covariance function**: To ensure the validity and stability of the Gaussian process, the authors need to prove that the corresponding covariance function remains strictly positive definite between any different pairs of input data. This involves that the minimum eigenvalue of the kernel matrix must be positive.
### Main contributions
- **Established the exchangeability of depth and width limits**: Through detailed analysis, the authors proved that for DEQ, the limits of depth and width can be interchanged, that is, regardless of whether the depth limit or the width limit is taken first, the final result is the same.
- **Proved the strict positive definiteness of the covariance function**: The authors showed that when the activation function is non - polynomial, the corresponding covariance function is strictly positive definite on the unit sphere. This means that the kernel matrix is non - degenerate, thereby ensuring the stability and validity of the Gaussian process.
- **Provided a theoretical basis**: These findings lay a theoretical foundation for studying the training and generalization performance of DEQ, further promoting the understanding and application of infinite - depth neural networks.
### Formula summary
- Recursive definition of the covariance function:
\[
\begin{aligned}
&\Sigma_1(x, x')=\sigma_u^2\langle x, x'\rangle / n_{in}, \\
&\Sigma_{\ell + 1}(x, x')=\sigma_w^2E[\phi(u_\ell(x))\phi(u_\ell(x'))],
\end{aligned}
\]
where \((u_\ell(x), u_\ell(x'))\) follows a bivariate Gaussian distribution with a mean of zero and a covariance of \(\text{Cov}(u_\ell(x), u_\ell(x'))=\begin{cases}
\Sigma_1(x, x'), &\ell = 1 \\
\Sigma_\ell(x, x')+\Sigma_1(x, x'), &\ell\in[2, L - 1]
\end{cases}\)
- Limit covariance function:
\[
\Sigma^*(x, x')=\lim_{\ell\rightarrow\infty}\Sigma_\ell(x, x').
\]
These formulas reveal the behavior of DEQ when the width tends to infinity and provide important mathematical tools for subsequent research.