Adaptive Proximal SGD Based on New Estimating Sequences for Sparser ERM

Zhuan Zhang,Shuisheng Zhou
DOI: https://doi.org/10.1016/j.ins.2023.118965
IF: 8.1
2023-01-01
Information Sciences
Abstract:Estimating sequences introduced by Nesterov is an efficient trick to accelerate gradient descent (GD). The stochastic version of estimating sequences is also successfully used to speed up stochastic gradient descent (SGD). In solving the non-smooth convex optimization problems, the convergence rate of SGD with stochastic estimating sequences is O ( 1 / k ). Here k is the number of iterations. In this paper, we present a new way of constructing estimating sequences. The characteristic of the new estimating sequences is to replace the subgradient with the proximal stochastic gradient. The novelty of the new estimating sequences is to replace the fixed learning rate with the adaptive learning rate. The adaptive learning rate is calculated by the exponential moving average of past squared stochastic gradients. Based on the new estimating sequences, we propose an adaptive proximal SGD algorithm, called ES-APSGD, for solving the large-scale ℓ 1-norm regularized empirical risk minimization (ERM). The proposed ES-APSGD simplifies the calculation and can obtain a convergence rate of O ( 1 / k 2 ). The significant advantage of ES-APSGD is to strengthen the sparsity of solution by adaptively adjusting the threshold magnitude. Experimental results on Lasso and ℓ 1-norm regularized logistic regression show that ES-APSGD speeds up convergence and obtains the sparser optimal solutions.
What problem does this paper attempt to address?