Abstract:In this work, we consider the convergence of Polyak's heavy ball method, both in continuous and discrete time, on a non-convex objective function. We recover the convergence rates derived in [Polyak, U.S.S.R. Comput. Math. and Math. Phys., 1964] for strongly convex objective functions, assuming only validity of the Polyak-Lojasiewicz inequality. In continuous time our result holds for all initializations, whereas in the discrete time setting we conduct a local analysis around the global minima. Our results demonstrate that the heavy ball method does, in fact, accelerate on the class of objective functions satisfying the Polyak-Lojasiewicz inequality. This holds even in the discrete time setting, provided the method reaches a neighborhood of the global minima. Instead of the usually employed Lyapunov-type arguments, our approach leverages a new differential geometric perspective of the Polyak-Lojasiewicz inequality proposed in [Rebjock and Boumal, Math. Program., 2024].

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is about the convergence analysis of Polyak's heavy ball method on non - convex objective functions satisfying the Polyak - Łojasiewicz (PL) inequality. Specifically, the author aims to prove: 1. **Global convergence rate in continuous - time setting**: For all initialization cases, Polyak's heavy ball method can achieve an accelerated local convergence rate in continuous time. 2. **Local convergence rate in discrete - time setting**: When the method enters the neighborhood of the global minimum, it can also achieve an accelerated local convergence rate in the discrete - time setting. ### Detailed problem description #### Research background In optimization problems, especially in machine - learning tasks, first - order optimization methods such as the gradient - descent method play a central role. To accelerate convergence, the momentum method updates the iteration points by storing and using past gradient information, thereby smoothing the oscillations in the standard gradient - descent and increasing the convergence speed on ill - conditioned optimization problems. However, relatively few studies have theoretically proven the advantages of these methods, and they are usually limited to specific types of objective functions. #### Polyak's heavy ball method Polyak's heavy ball method is an optimization algorithm with a momentum term, and its iteration formula is: \[ x_{n + 1}=x_n-\gamma\nabla f(x_n)+\beta(x_n - x_{n - 1}), \] where \(\gamma\) is the learning rate and \(\beta\) is the momentum parameter. This method can be regarded as the discrete form of the heavy - ball differential equation: \[ \dot{x}_t = v_t, \] \[ \dot{v}_t=-\alpha v_t-\nabla f(x_t), \] where \(\alpha\) is the friction parameter. #### PL inequality The PL inequality is defined as: \[ \|\nabla f(x)\|^2\geq2\mu(f(x)-f(x^*)), \] where \(\mu > 0\) is a constant and \(x^*\) is the global minimum point. This condition is weaker than the strong convexity assumption but can still guarantee the rapid convergence of certain optimization algorithms. ### Main contributions 1. **Recovering Polyak's original convergence rate**: The author proves that on non - convex objective functions satisfying the PL inequality, Polyak's heavy ball method can recover the original convergence rate given by Polyak in [Pol64]. 2. **New geometric perspective**: Different from the traditional Lyapunov function method, the author adopts a new differential geometric perspective to analyze the PL inequality, thus simplifying the proof of convergence. 3. **Local convergence rate in discrete - time setting**: By introducing appropriate assumptions, the author proves that in the discrete - time setting, as long as the method enters the neighborhood of the global minimum, an accelerated local convergence rate can be achieved. ### Conclusion This research not only extends the scope of application of Polyak's heavy ball method but also provides new theoretical tools for understanding the acceleration mechanism in non - convex optimization problems. By introducing the PL inequality, the author successfully proves the accelerated convergence of this method under weaker assumptions.

Polyak's Heavy Ball Method Achieves Accelerated Local Rate of Convergence under Polyak-Lojasiewicz Inequality

An accelerated lyapunov function for Polyak's Heavy-ball on convex quadratics

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition

On Convergence Rates of Linearized Proximal Algorithms for Convex Composite Optimization with Applications.

Primitive Heavy-ball Dynamics Achieves $O(\varepsilon^{-7/4})$ Convergence for Nonconvex Optimization

Heavy Ball Momentum for Non-Strongly Convex Optimization

Accelerated Over-Relaxation Heavy-Ball Methods with Provable Acceleration and Global Convergence

Linear Convergence of the Proximal Gradient Method for Composite Optimization Under the Polyak-Łojasiewicz Inequality and Its Variant

Convergence analysis of a stochastic heavy-ball method for linear ill-posed problems

A Robust Control Approach to Asymptotic Optimality of the Heavy Ball Method for Optimization of Quadratic Functions

On the Convergence Analysis of Aggregated Heavy-Ball Method

Accelerated gradient methods for nonconvex optimization: Escape trajectories from strict saddle points and convergence to local minima

Non-Convex Stochastic Composite Optimization with Polyak Momentum

Achieving Acceleration in Distributed Optimization via Direct Discretization of the Heavy-Ball ODE

Acceleration and restart for the randomized Bregman-Kaczmarz method

Exact Convergence rate of the subgradient method by using Polyak step size

Convergence of Nonmonotone Proximal Gradient Methods under the Kurdyka-Lojasiewicz Property without a Global Lipschitz Assumption

Nonsmooth Nonconvex Stochastic Heavy Ball

A new proximal heavy ball inexact line-search algorithm

Local convergence rates for Wasserstein gradient flows and McKean-Vlasov equations with multiple stationary solutions

Local Conditions for Global Convergence of Gradient Flows and Proximal Point Sequences in Metric Spaces