Stochastic Cubic-Regularized Policy Gradient Method

Pengfei Wang,Hongyu Wang,Nenggan Zheng
DOI: https://doi.org/10.1016/j.knosys.2022.109687
IF: 8.139
2022-01-01
Knowledge-Based Systems
Abstract:Policy-based reinforcement learning methods have achieved great achievements in real-world decision-making problems. However, the theoretical understanding of policy-based methods is still limited. Specifically, existing works mainly focus on first-order stationary point policies (FOSPs); in some very special reinforcement learning settings (e.g., tabular case and function approximation with restricted parametric policy classes) some works consider globally optimal policy. It is well-known that FOSPs could be undesirable local optima or saddle points, and obtaining a global optimum is generally NP-hard. In this paper, we propose a policy gradient method that provably converges to second-order stationary point policies (SOSPs) for any differentiable policy classes. The proposed method is computationally efficient, and it judiciously uses cubic-regularized subroutines to escape saddle points while at the same time minimizing the Hessian-based computations. We prove that the method enjoys the sample complexity of O ˜ ( ϵ − 3 . 5 ), which improves upon the current optimal complexity O ˜ ϵ − 4 . 5. Finally, experimental results are provided to demonstrate the effectiveness of the method. • We propose a new algorithm SCR-PG which achieves the good properties of previous works. • We provide a non-asymptotic analysis of SCR-PG’s complexity with high probability. • Experimental results are presented to validate the superior performance of SCR-PG.
What problem does this paper attempt to address?