Abstract:Many problems arising in machine learning can be finally reduced to optimization problems.Convex optimization algorithms have been successfully adapted in various kinds of learning optimization problems.And whether the optimal convergence rate can be attained is one of the basic problems in the study of optimization algorithms.Besides,sparsity is another concern in sparse learning problems.So far,a great deal of stochastic optimization algorithms have been presented for solving the large scale learning problems.However,most of the-state-of-the-arts stochastic optimization algorithms only attain the optimal convergence rates in terms of the averaged output,and the desired sparsity can not be guaranteed.In contrast to the averaged output,the individual solution usually offers more sufficient sparsity.Unfortunately,it is not easy to make the individual convergence rate optimal and the optimal individual convergence rate in strongly-convex cases has been extensively exploring as an open problem.For solving smooth objective optimization problems,it is well known that the step-size rule raised by the famous researcher Nesterov's can accelerate the convergence rate of the first order gradient algorithm by orders of magnitude,and the optimal individual convergence rate are simultaneously derived.Recently,Nesterov's acceleration algorithm has been commonly applied in various learning optimization problem with smooth loss functions,and a large number of stochastic optimization algorithms in smooth cases have been developed based on the Nesterov's acceleration strategy.Obviously,whether the Nesterov's step-size rule can be extended to obtain the optimal individual convergence rate for nonsmooth objective optimization problems is an interesting problem.In this paper,the Nesterov's step-size rule in smooth objective cases is incorporated into the gradient method for solving nonsmooth objective optimization problems.In particular,focusing on the classic first order gradient methods,we present a new projected subgradient method with the Nesterov's step-size rule.It is proved that the proposed method can achieve the optimal individual convergence rate when solving nonsmooth optimization problems.Such conclusion is stronger than the previous one that the regular projected subgradient method can obtain the optimal convergence result only in terms of the averaged output.And it can also be regarded as an approximate answer to the question of whether first order gradient methods can achieve the optimal individual convergence rate.Compared with the regular projected subgradient methods in which the averaged output is used or the modified projected subgradient methods in which the linear interpolation operation is employed,the subgradient-like operation follows the extrapolation evaluation in our method,which brings significant benefits in keeping the sufficient sparsity when solving the hinge loss function optimization problems on an l1-norm ball.The experiments on two synthetic datasets verify that our theoretical analysis is correct,and the experiments on several benchmark datasets demonstrate that the proposed methods have almost the same convergence behavior but offer more sufficient sparsity.As future work,the optimal individual convergence in regularized sparse learning problems and the stability of individual convergence in stochastic optimization will be considered.Moreover,by using the Nesterov's step-size rule,whether the optimal individual convergence for strongly-convex objective functions can be achieved will be investigated.

The Nesterov-Spokoiny Acceleration: $o(1/k^2)$ Convergence without Proximal Operations

Linear Convergence of Forward-Backward Accelerated Algorithms without Knowledge of the Modulus of Strong Convexity

Accelerated Quasi-Newton Proximal Extragradient: Faster Rate for Smooth Convex Optimization

Accelerating Nesterov's Method for Strongly Convex Functions with Lipschitz Gradient

Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation

Nesterov acceleration despite very noisy gradients

The Global R-linear Convergence of Nesterov's Accelerated Gradient Method with Unknown Strongly Convex Parameter

Nesterov's Acceleration For Approximate Newton.

On adapting Nesterov's scheme to accelerate iterative methods for linear problems

A Note on Nesterov's Accelerated Method in Nonconvex Optimization: a Weak Estimate Sequence Approach

A feasible smoothing accelerated projected gradient method for nonsmooth convex optimization

Smoothing accelerated algorithm for constrained nonsmooth convex optimization problems

The "Black-Box" Optimization Problem: Zero-Order Accelerated Stochastic Method via Kernel Approximation

Accelerated gradient methods for sparse statistical learning with nonconvex penalties

The Individual Convergence of Projected Subgradient Methods Using the Nesterov's Step-Size Strategy

Study of the behaviour of Nesterov Accelerated Gradient in a non convex setting: the strongly quasar convex case

Nesterov acceleration in benignly non-convex landscapes

The Black-Box Optimization Problem: Zero-Order Accelerated Stochastic Method via Kernel Approximation

A convergence analysis of Nesterov’s accelerated gradient method in training deep linear neural networks

Proximal Subgradient Norm Minimization of ISTA and FISTA