SVRG with Adaptive Epoch Size.

Erxue Min,Yawei Zhao,Jun Long,Chengkun Wu,Kuan Li,Jianping Yin
DOI: https://doi.org/10.1109/ijcnn.2017.7966219
2017-01-01
Abstract:Stochastic gradient descent (SGD) is a commonly used technique in large-scale machine learning tasks, but its convergence is slow due to the inherent variance. In recent years, a popular method, Stochastic Variance Reduced Gradient (SVRG), addresses this shortcoming via computing the full gradient of the entire dataset in each epoch. However, conventional SVRG and its variants usually need to identify a hyperparameter - the epoch size, which is essential to the convergence performance. Few previous studies discuss how to systematically find a suitable value for that hyper-parameter, which makes it hard to gain a good convergence performance in practical machine learning applications. In this paper, we propose a new stochastic gradient descent named AESVRG, which introduces variance reduction and computes the full gradient adaptively. Its enhanced implementation, AESVRG+, has a convergence performance that can outplay existing SVRG with fine-tuned epoch sizes. An extensive evaluation illustrates the significant performance improvement of our method.
What problem does this paper attempt to address?