Two-Layer Neural Networks for Partial Differential Equations: Optimization and Generalization Theory

Tao Luo,Haizhao Yang
DOI: https://doi.org/10.48550/arXiv.2006.15733
2020-12-11
Abstract:The problem of solving partial differential equations (PDEs) can be formulated into a least-squares minimization problem, where neural networks are used to parametrize PDE solutions. A global minimizer corresponds to a neural network that solves the given PDE. In this paper, we show that the gradient descent method can identify a global minimizer of the least-squares optimization for solving second-order linear PDEs with two-layer neural networks under the assumption of over-parametrization. We also analyze the generalization error of the least-squares optimization for second-order linear PDEs and two-layer neural networks, when the right-hand-side function of the PDE is in a Barron-type space and the least-squares optimization is regularized with a Barron-type norm, without the over-parametrization assumption.
Numerical Analysis,Machine Learning,Optimization and Control
What problem does this paper attempt to address?
This paper attempts to solve the optimization and generalization theory problems when using two - layer neural networks to solve partial differential equations (PDEs). Specifically, the authors focus on the following two core issues: 1. **Optimization convergence**: Under what conditions can the gradient descent method converge to the global minimum for solving second - order linear partial differential equations? 2. **Generalization error analysis**: When the right - hand function of the PDE is in the Barron - type space and the least - squares optimization is regularized with the path norm, without the over - parameterization assumption, how large is the gap between the global minimum of the empirical loss and the global minimum of the overall loss? ### Detailed Explanation #### Optimization Convergence The paper shows that under the over - parameterization assumption, the gradient descent method can identify the global minimum of a two - layer neural network for solving second - order linear partial differential equations. Specifically, when the number of parameters in the neural network is large enough, the gradient descent method can converge to the global minimum of the empirical loss at a linear convergence rate. #### Generalization Error Analysis The authors also analyze the gap between the empirical risk and the overall risk when using a two - layer neural network to solve second - order linear partial differential equations. In particular, they prove that the posterior generalization error can be bounded by the path norm, and the prior generalization error can be bounded by the Barron norm. ### Summary of Mathematical Formulas - **Empirical Risk**: \[ R_S(\theta):=\frac{1}{n}\sum_{i = 1}^n\ell(L\phi(x_i;\theta), f(x_i)) \] - **Overall Risk**: \[ R_D(\theta):=\mathbb{E}_{x\sim U(\Omega)}[\ell(L\phi(x;\theta), f(x))] \] - **Gradient Descent Update Rule**: \[ \dot{\theta}=-\nabla_\theta R_S(\theta) \] - **Linear Convergence Rate Theorem**: \[ R_S(\theta(t))\leq\exp\left(-\frac{m\lambda_S t}{n}\right)R_S(\theta_0) \] - **Posterior Generalization Error Bound**: \[ |R_D(\theta)-R_S(\theta)|\leq\frac{(\|\theta\|_P + 1)^2}{\sqrt{n}}\cdot2M^2\left(14d^2\sqrt{2\log(2d)}+\log[\pi(\|\theta\|_P + 1)]+\sqrt{2\log\left(\frac{1}{3\delta}\right)}\right) \] - **Prior Generalization Error Bound**: \[ R_D(\theta_{S,\lambda})\leq\frac{6M^2\|f\|^2_B}{m}+\frac{\|f\|^2_B + 1}{\sqrt{n}}\left(4\lambda+16M^2\right)\left\{\log[\pi(2\|f\|_B + 1)]+14d^2\sqrt{\log(2d)}+\sqrt{\log\left(\frac{2}{3\delta}\right)}\right\} \] Through these theoretical results, the authors provide a solid theoretical foundation for using deep - learning methods to solve partial differential equations and lay the foundation for further research on high - order partial differential equations and applications in other fields.