Projected Policy Gradient Converges in a Finite Number of Iterations

Jiacai Liu,Wenye Li,Wei Ke
DOI: https://doi.org/10.48550/arxiv.2311.01104
2023-01-01
Abstract:The convergence of the projected policy gradient (PPG) method under the simplex parameterization is studied and it is shown that this method indeed achieves the exact convergence in a finite number of iterations for any constant step size. To establish this result, we first establish the sublinear convergence of PPG for an arbitrary fixed step size, which is also new, to the best of knowledge. The finite iteration convergence property is also applicable to a preconditioned version of PPG, namely the projected Q-ascent (PQA) method. Additionally, the linear convergence of PPG and its equivalence to PI are established under the non-adaptive increasing step sizes and the adaptive step sizes, respectively.
What problem does this paper attempt to address?