Abstract:Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed the development of their theoretical foundations. Despite the huge efforts directed at the design of efficient stochastic PG-type algorithms, the understanding of their convergence to a globally optimal policy is still limited. In this work, we develop improved global convergence guarantees for a general class of Fisher-non-degenerate parameterized policies which allows to address the case of continuous state action spaces. First, we propose a Normalized Policy Gradient method with Implicit Gradient Transport (N-PG-IGT) and derive a $\tilde{\mathcal{O}}(\varepsilon^{-2.5})$ sample complexity of this method for finding a global $\varepsilon$-optimal policy. Improving over the previously known $\tilde{\mathcal{O}}(\varepsilon^{-3})$ complexity, this algorithm does not require the use of importance sampling or second-order information and samples only one trajectory per iteration. Second, we further improve this complexity to $\tilde{ \mathcal{\mathcal{O}} }(\varepsilon^{-2})$ by considering a Hessian-Aided Recursive Policy Gradient ((N)-HARPG) algorithm enhanced with a correction based on a Hessian-vector product. Interestingly, both algorithms are $(i)$ simple and easy to implement: single-loop, do not require large batches of trajectories and sample at most two trajectories per iteration; $(ii)$ computationally and memory efficient: they do not require expensive subroutines at each iteration and can be implemented with memory linear in the dimension of parameters.

Stochastic Recursive Momentum for Policy Gradient Methods

Policy Optimization with Stochastic Mirror Descent.

Stochastic Cubic-Regularized Policy Gradient Method

Sample Complexity of Policy Gradient Finding Second-Order Stationary Points

Stochastic Momentum Method with Double Acceleration for Regularized Empirical Risk Minimization

Fast Stochastic Policy Gradient: Negative Momentum for Reinforcement Learning

Stochastic Variance-Reduced Policy Gradient

Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies

Stochastic Recursive Momentum Method for Non-Convex Compositional Optimization

Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning

Variance Reduction based Partial Trajectory Reuse to Accelerate Policy Gradient Optimization

Fast Stochastic Variance Reduced Gradient Method with Momentum Acceleration for Machine Learning

A Temporal-Difference Approach to Policy Gradient Estimation

Decentralized Multi-Task Reinforcement Learning Policy Gradient Method with Momentum over Networks.

Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes

A Distributed Adaptive Policy Gradient Method Based on Momentum for Multi-Agent Reinforcement Learning

Policy Gradient with Active Importance Sampling

Hessian Aided Policy Gradient

Global Convergence of Natural Policy Gradient with Hessian-Aided Momentum Variance Reduction

Efficient sample reuse in policy gradients with parameter-based exploration

Gradient Temporal Difference with Momentum: Stability and Convergence