Fvr-Sgd: A New Flexible Variance-Reduction Method For Sgd On Large-Scale Datasets

Mingxing Tang,Zhen Huang,Linbo Qiao,Shuyang Du,Yuxing Peng,Changjian Wang
DOI: https://doi.org/10.1007/978-3-030-04179-3_16
2018-01-01
Abstract:Stochastic gradient descent (SGD) is a popular optimization method widely-used in machine learning, while the variance of gradient estimation leads to slow convergence. To accelerate the speed, many variance reduction methods have been proposed. However, most of these methods require additional memory cost or computational burden on full gradient, which results in low efficiency or even unavailable while applied to real-world applications with large-scale datasets. To handle this issue, we propose a new flexible variance reduction method for SGD, named FVR-SGD, which can reduce memory overhead and speedup the convergence using flexible subset size without extra operation. The details of convergence property are presented, the convergence of variance reduction method using flexible subset size can be guaranteed. Several numerical experiments are conducted on a genre of real-world large-scale datasets. The experimental results demonstrated that FVR-SGD outperforms contemporary SVRG algorithm. Specifically, the proposed method can achieve up to 40% reduction in the training time to solve the optimization problem of logistic regression.
What problem does this paper attempt to address?