Polyak's Heavy Ball Method Achieves Accelerated Local Rate of Convergence under Polyak-Lojasiewicz Inequality

Sebastian Kassing,Simon Weissmann
2024-10-22
Abstract:In this work, we consider the convergence of Polyak's heavy ball method, both in continuous and discrete time, on a non-convex objective function. We recover the convergence rates derived in [Polyak, U.S.S.R. Comput. Math. and Math. Phys., 1964] for strongly convex objective functions, assuming only validity of the Polyak-Lojasiewicz inequality. In continuous time our result holds for all initializations, whereas in the discrete time setting we conduct a local analysis around the global minima. Our results demonstrate that the heavy ball method does, in fact, accelerate on the class of objective functions satisfying the Polyak-Lojasiewicz inequality. This holds even in the discrete time setting, provided the method reaches a neighborhood of the global minima. Instead of the usually employed Lyapunov-type arguments, our approach leverages a new differential geometric perspective of the Polyak-Lojasiewicz inequality proposed in [Rebjock and Boumal, Math. Program., 2024].
Optimization and Control,Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is about the convergence analysis of Polyak's heavy ball method on non - convex objective functions satisfying the Polyak - Łojasiewicz (PL) inequality. Specifically, the author aims to prove: 1. **Global convergence rate in continuous - time setting**: For all initialization cases, Polyak's heavy ball method can achieve an accelerated local convergence rate in continuous time. 2. **Local convergence rate in discrete - time setting**: When the method enters the neighborhood of the global minimum, it can also achieve an accelerated local convergence rate in the discrete - time setting. ### Detailed problem description #### Research background In optimization problems, especially in machine - learning tasks, first - order optimization methods such as the gradient - descent method play a central role. To accelerate convergence, the momentum method updates the iteration points by storing and using past gradient information, thereby smoothing the oscillations in the standard gradient - descent and increasing the convergence speed on ill - conditioned optimization problems. However, relatively few studies have theoretically proven the advantages of these methods, and they are usually limited to specific types of objective functions. #### Polyak's heavy ball method Polyak's heavy ball method is an optimization algorithm with a momentum term, and its iteration formula is: \[ x_{n + 1}=x_n-\gamma\nabla f(x_n)+\beta(x_n - x_{n - 1}), \] where \(\gamma\) is the learning rate and \(\beta\) is the momentum parameter. This method can be regarded as the discrete form of the heavy - ball differential equation: \[ \dot{x}_t = v_t, \] \[ \dot{v}_t=-\alpha v_t-\nabla f(x_t), \] where \(\alpha\) is the friction parameter. #### PL inequality The PL inequality is defined as: \[ \|\nabla f(x)\|^2\geq2\mu(f(x)-f(x^*)), \] where \(\mu > 0\) is a constant and \(x^*\) is the global minimum point. This condition is weaker than the strong convexity assumption but can still guarantee the rapid convergence of certain optimization algorithms. ### Main contributions 1. **Recovering Polyak's original convergence rate**: The author proves that on non - convex objective functions satisfying the PL inequality, Polyak's heavy ball method can recover the original convergence rate given by Polyak in [Pol64]. 2. **New geometric perspective**: Different from the traditional Lyapunov function method, the author adopts a new differential geometric perspective to analyze the PL inequality, thus simplifying the proof of convergence. 3. **Local convergence rate in discrete - time setting**: By introducing appropriate assumptions, the author proves that in the discrete - time setting, as long as the method enters the neighborhood of the global minimum, an accelerated local convergence rate can be achieved. ### Conclusion This research not only extends the scope of application of Polyak's heavy ball method but also provides new theoretical tools for understanding the acceleration mechanism in non - convex optimization problems. By introducing the PL inequality, the author successfully proves the accelerated convergence of this method under weaker assumptions.