IS-ASGD

Fei Wang,Xiaofeng Gao,Jun Ye,Guihai Chen
DOI: https://doi.org/10.1145/3225058.3225135
2018-01-01
Abstract:Variance reduction (VR) techniques for convergence rate acceleration of stochastic gradient descent (SGD) algorithm have been developed with great efforts recently. VR's two variants, stochastic variance-reduced-gradient (SVRG-SGD) and importance sampling (IS-SGD) have achieved remarkable progresses. Meanwhile, asynchronous SGD (ASGD) is becoming more critical due to the ever-increasing scale of the optimization problems. The application of VR in ASGD to accelerate its convergence rate has therefore attracted much interest and SVRG-ASGDs were proposed. However, we found that SVRG suffers dissatisfying performance in accelerating ASGD when datasets are sparse and large-scale. In such case, SVRG-ASGD's iterative computation cost is magnitudes higher than plain ASGD which makes it very inefficient. On the other hand, IS achieves improved convergence rate with few extra computation cost and is invariant to the sparsity of datasets. These advantages make it very suitable for the acceleration of ASGD on large-scale sparse datasets. In this paper we propose a novel IS-combined ASGD for efficient convergence rate acceleration, namely, IS-ASGD. We theoretically prove the superior convergence bound of IS-ASGD. Experimental results also demonstrate our statements.
What problem does this paper attempt to address?