Abstract:We study the problem of computing an optimal policy of an infinite-horizon discounted constrained Markov decision process (constrained MDP). Despite the popularity of Lagrangian-based policy search methods used in practice, the oscillation of policy iterates in these methods has not been fully understood, bringing out issues such as violation of constraints and sensitivity to hyper-parameters. To fill this gap, we employ the Lagrangian method to cast a constrained MDP into a constrained saddle-point problem in which max/min players correspond to primal/dual variables, respectively, and develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy. Specifically, we first propose a regularized policy gradient primal-dual (RPG-PD) method that updates the policy using an entropy-regularized policy gradient, and the dual variable via a quadratic-regularized gradient ascent, simultaneously. We prove that the policy primal-dual iterates of RPG-PD converge to a regularized saddle point with a sublinear rate, while the policy iterates converge sublinearly to an optimal constrained policy. We further instantiate RPG-PD in large state or action spaces by including function approximation in policy parametrization, and establish similar sublinear last-iterate policy convergence. Second, we propose an optimistic policy gradient primal-dual (OPG-PD) method that employs the optimistic gradient method to update primal/dual variables, simultaneously. We prove that the policy primal-dual iterates of OPG-PD converge to a saddle point that contains an optimal constrained policy, with a linear rate. To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.

Accelerating Primal-Dual Methods for Regularized Markov Decision Processes

Accelerating Primal-dual Methods for Regularized Markov Decision Processes

Convergence Rate of Primal-Dual Approach to Constrained Reinforcement Learning with Softmax Policy

Primal-Dual Regression Approach for Markov Decision Processes with General State and Action Spaces

Accelerated nonlinear primal-dual hybrid gradient methods with applications to supervised machine learning

Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes

Deterministic and Stochastic Accelerated Gradient Method for Convex Semi-Infinite Optimization

Accelerated Primal-dual Scheme for a Class of Stochastic Nonconvex-concave Saddle Point Problems

Accelerated Primal-Dual Algorithms for Distributed Smooth Convex Optimization over Networks

On Stochastic Primal-Dual Hybrid Gradient Approach for Compositely Regularized Minimization.

Essentially Sharp Estimates on the Entropy Regularization Error in Discrete Discounted Markov Decision Processes

Policy-based Primal-Dual Methods for Concave CMDP with Variance Reduction

Accelerated Primal-Dual Proximal Gradient Splitting Methods for Convex-Concave Saddle-Point Problems

Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs

Robust Accelerated Primal-Dual Methods for Computing Saddle Points

Regularized stochastic dual dynamic programming for convex nonlinear optimization problems

Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs

On the Complexity Analysis of the Primal Solutions for the Accelerated Randomized Dual Coordinate Ascent

Accelerated primal-dual methods with enlarged step sizes and operator learning for nonsmooth optimal control problems

Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs

Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in the Mean-Field Regime