Global Convergence of Natural Policy Gradient with Hessian-Aided Momentum Variance Reduction

Jie Feng,Ke Wei,Jinchi Chen
DOI: https://doi.org/10.1007/s10915-024-02688-x
2024-10-05
Journal of Scientific Computing
Abstract:Natural policy gradient (NPG) and its variants are widely-used policy search methods in reinforcement learning. Inspired by prior work, a new NPG variant coined NPG-HM is developed in this paper, which utilizes the Hessian-aided momentum technique for variance reduction, while the sub-problem is solved via the stochastic gradient descent method. It is shown that NPG-HM can achieve the global last iterate -optimality with a sample complexity of , which is the best known result for natural policy gradient type methods under the generic Fisher non-degenerate policy parameterizations. The convergence analysis is built upon a relaxed weak gradient dominance property tailored for NPG under the compatible function approximation framework, as well as a neat way to decompose the error when handling the sub-problem. Moreover, numerical experiments on Mujoco-based environments demonstrate the superior performance of NPG-HM over other state-of-the-art policy gradient methods.
mathematics, applied
What problem does this paper attempt to address?