A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions

Patrick Cheridito,Arnulf Jentzen,Adrian Riekert,Florian Rossmannek
DOI: https://doi.org/10.1016/j.jco.2022.101646
IF: 1.333
2022-01-01
Journal of Complexity
Abstract:Gradient descent (GD) optimization algorithms are the standard ingredients that are used to train artificial neural networks (ANNs). However, even in the case of the most basic variant of GD optimization algorithms, the plain vanilla GD method, it remains until today an open problem to prove or disprove the conjecture that GD converges in the training of ANNs. In this article we solve this problem in the special situation where the target function under consideration is a constant function. More specifically, in the case of constant target functions we prove in the training of rectified fully-connected feedforward ANNs with one-hidden layer that the risk function of the GD method does indeed converge to zero. A key contribution of this work is also to explicitly specify a Lyapunov function for the gradient flow system of the ANN parameters. This Lyapunov function is the central tool in our convergence proof.
mathematics, applied
What problem does this paper attempt to address?