The Individual Convergence of Projected Subgradient Methods Using the Nesterov's Step-Size Strategy
Wei TAO,Zhi-Song PAN,De-Jun CHU,Qing TAO
DOI: https://doi.org/10.11897/SP.J.1016.2018.00164
2018-01-01
Chinese Journal of Computers
Abstract:Many problems arising in machine learning can be finally reduced to optimization problems.Convex optimization algorithms have been successfully adapted in various kinds of learning optimization problems.And whether the optimal convergence rate can be attained is one of the basic problems in the study of optimization algorithms.Besides,sparsity is another concern in sparse learning problems.So far,a great deal of stochastic optimization algorithms have been presented for solving the large scale learning problems.However,most of the-state-of-the-arts stochastic optimization algorithms only attain the optimal convergence rates in terms of the averaged output,and the desired sparsity can not be guaranteed.In contrast to the averaged output,the individual solution usually offers more sufficient sparsity.Unfortunately,it is not easy to make the individual convergence rate optimal and the optimal individual convergence rate in strongly-convex cases has been extensively exploring as an open problem.For solving smooth objective optimization problems,it is well known that the step-size rule raised by the famous researcher Nesterov's can accelerate the convergence rate of the first order gradient algorithm by orders of magnitude,and the optimal individual convergence rate are simultaneously derived.Recently,Nesterov's acceleration algorithm has been commonly applied in various learning optimization problem with smooth loss functions,and a large number of stochastic optimization algorithms in smooth cases have been developed based on the Nesterov's acceleration strategy.Obviously,whether the Nesterov's step-size rule can be extended to obtain the optimal individual convergence rate for nonsmooth objective optimization problems is an interesting problem.In this paper,the Nesterov's step-size rule in smooth objective cases is incorporated into the gradient method for solving nonsmooth objective optimization problems.In particular,focusing on the classic first order gradient methods,we present a new projected subgradient method with the Nesterov's step-size rule.It is proved that the proposed method can achieve the optimal individual convergence rate when solving nonsmooth optimization problems.Such conclusion is stronger than the previous one that the regular projected subgradient method can obtain the optimal convergence result only in terms of the averaged output.And it can also be regarded as an approximate answer to the question of whether first order gradient methods can achieve the optimal individual convergence rate.Compared with the regular projected subgradient methods in which the averaged output is used or the modified projected subgradient methods in which the linear interpolation operation is employed,the subgradient-like operation follows the extrapolation evaluation in our method,which brings significant benefits in keeping the sufficient sparsity when solving the hinge loss function optimization problems on an l1-norm ball.The experiments on two synthetic datasets verify that our theoretical analysis is correct,and the experiments on several benchmark datasets demonstrate that the proposed methods have almost the same convergence behavior but offer more sufficient sparsity.As future work,the optimal individual convergence in regularized sparse learning problems and the stability of individual convergence in stochastic optimization will be considered.Moreover,by using the Nesterov's step-size rule,whether the optimal individual convergence for strongly-convex objective functions can be achieved will be investigated.