Novel Convergence Results of Adaptive Stochastic Gradient Descents
Tao Sun,Linbo Qiao,Qing Liao,Dongsheng Li
DOI: https://doi.org/10.1109/tip.2020.3038535
IF: 10.6
2021-01-01
IEEE Transactions on Image Processing
Abstract:Adaptive stochastic gradient descent, which uses unbiased samples of the gradient with stepsizes chosen from the historical information, has been widely used to train neural networks for computer vision and pattern recognition tasks. This paper revisits the theoretical aspects of two classes of adaptive stochastic gradient descent methods, which contain several existing state-of-the-art schemes. We focus on the presentation of novel findings: In the general smooth case, the nonergodic convergence results are given, that is, the expectation of the gradients' norm rather than the minimum of past iterates is proved to converge; We also studied their performances under Polyak-Łojasiewicz property on the objective function. In this case, the nonergodic convergence rates are given for the expectation of the function values. Our findings show that more substantial restrictions on the steps are needed to guarantee the nonergodic function values' convergence (rates).
computer science, artificial intelligence,engineering, electrical & electronic