Stochastic Gradient and Langevin Processes

Xiang Cheng,Dong Yin,Peter Bartlett,Michael Jordan
DOI: https://doi.org/10.48550/arxiv.1907.03215
2019-01-01
Abstract:We prove quantitative convergence rates at which discrete Langevin-likeprocesses converge to the invariant distribution of a related stochasticdifferential equation. We study the setup where the additive noise can benon-Gaussian and state-dependent and the potential function can be non-convex.We show that the key properties of these processes depend on the potentialfunction and the second moment of the additive noise. We apply our theoreticalfindings to studying the convergence of Stochastic Gradient Descent (SGD) fornon-convex problems and corroborate them with experiments using SGD to traindeep neural networks on the CIFAR-10 dataset.
What problem does this paper attempt to address?