Abstract:In this work, we consider the convergence of Polyak's heavy ball method, both in continuous and discrete time, on a non-convex objective function. We recover the convergence rates derived in [Polyak, U.S.S.R. Comput. Math. and Math. Phys., 1964] for strongly convex objective functions, assuming only validity of the Polyak-Lojasiewicz inequality. In continuous time our result holds for all initializations, whereas in the discrete time setting we conduct a local analysis around the global minima. Our results demonstrate that the heavy ball method does, in fact, accelerate on the class of objective functions satisfying the Polyak-Lojasiewicz inequality. This holds even in the discrete time setting, provided the method reaches a neighborhood of the global minima. Instead of the usually employed Lyapunov-type arguments, our approach leverages a new differential geometric perspective of the Polyak-Lojasiewicz inequality proposed in [Rebjock and Boumal, Math. Program., 2024].
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is about the convergence analysis of Polyak's heavy ball method on non - convex objective functions satisfying the Polyak - Łojasiewicz (PL) inequality. Specifically, the author aims to prove:
1. **Global convergence rate in continuous - time setting**: For all initialization cases, Polyak's heavy ball method can achieve an accelerated local convergence rate in continuous time.
2. **Local convergence rate in discrete - time setting**: When the method enters the neighborhood of the global minimum, it can also achieve an accelerated local convergence rate in the discrete - time setting.
### Detailed problem description
#### Research background
In optimization problems, especially in machine - learning tasks, first - order optimization methods such as the gradient - descent method play a central role. To accelerate convergence, the momentum method updates the iteration points by storing and using past gradient information, thereby smoothing the oscillations in the standard gradient - descent and increasing the convergence speed on ill - conditioned optimization problems. However, relatively few studies have theoretically proven the advantages of these methods, and they are usually limited to specific types of objective functions.
#### Polyak's heavy ball method
Polyak's heavy ball method is an optimization algorithm with a momentum term, and its iteration formula is:
\[ x_{n + 1}=x_n-\gamma\nabla f(x_n)+\beta(x_n - x_{n - 1}), \]
where \(\gamma\) is the learning rate and \(\beta\) is the momentum parameter. This method can be regarded as the discrete form of the heavy - ball differential equation:
\[ \dot{x}_t = v_t, \]
\[ \dot{v}_t=-\alpha v_t-\nabla f(x_t), \]
where \(\alpha\) is the friction parameter.
#### PL inequality
The PL inequality is defined as:
\[ \|\nabla f(x)\|^2\geq2\mu(f(x)-f(x^*)), \]
where \(\mu > 0\) is a constant and \(x^*\) is the global minimum point. This condition is weaker than the strong convexity assumption but can still guarantee the rapid convergence of certain optimization algorithms.
### Main contributions
1. **Recovering Polyak's original convergence rate**: The author proves that on non - convex objective functions satisfying the PL inequality, Polyak's heavy ball method can recover the original convergence rate given by Polyak in [Pol64].
2. **New geometric perspective**: Different from the traditional Lyapunov function method, the author adopts a new differential geometric perspective to analyze the PL inequality, thus simplifying the proof of convergence.
3. **Local convergence rate in discrete - time setting**: By introducing appropriate assumptions, the author proves that in the discrete - time setting, as long as the method enters the neighborhood of the global minimum, an accelerated local convergence rate can be achieved.
### Conclusion
This research not only extends the scope of application of Polyak's heavy ball method but also provides new theoretical tools for understanding the acceleration mechanism in non - convex optimization problems. By introducing the PL inequality, the author successfully proves the accelerated convergence of this method under weaker assumptions.