Gradient methods for convex minimization: better rates under weaker conditions

Hui Zhang,Wotao Yin
2013-01-01
Abstract: The convergence behavior of gradient methods for minimizing convex differentiable functions is one of the core questions in convex optimization. This paper shows that their well-known complexities can be achieved under conditions weaker than the commonly accepted ones. We relax the common gradient Lipschitz-continuity condition and strong convexity condition to ones that hold only over certain line segments. Specifically, we establish complexities $O(\frac{R}{\epsilon})$ and $O(\sqrt{\frac{R}{\epsilon}})$ for the ordinary and accelerate gradient methods, respectively, assuming that $\nabla f$ is Lipschitz continuous with constant $R$ over the line segment joining $x$ and $x-\frac{1}{R}\nabla f$ for each $x\in\dom f$. Then we improve them to $O(\frac{R}{\nu}\log(\frac{1}{\epsilon}))$ and $O(\sqrt{\frac{R}{\nu}}\log(\frac{1}{\epsilon}))$ for function $f$ that also satisfies the secant inequality $\ < \nabla f(x), x- x^*\ > \ge \nu\|x-x^*\|^2$ for each $x\in \dom f$ and its projection $x^*$ to the minimizer set of $f$. The secant condition is also shown to be necessary for the geometric decay of solution error. Not only are the relaxed conditions met by more functions, the restrictions give smaller $R$ and larger $\nu$ than they are without the restrictions and thus lead to better complexity bounds. We apply these results to sparse optimization and demonstrate a faster algorithm.
What problem does this paper attempt to address?