Continuous-Time Policy Optimization.

Guojian Zhan,Yuxuan Jiang,Jingliang Duan,Shengbo Eben Li,Bo Cheng,Keqiang Li
DOI: https://doi.org/10.23919/acc55779.2023.10156372
2023-01-01
Abstract:Discretized dynamics is widespread in numerical optimization and optimal control. However, the physical system is inherently continuous at the macroscopic scale, thus handling the original continuous-time problem is desirable. In this paper, we focus on learning an optimal policy under the continuous-time finite-horizon optimal control setting. We introduce continuous-time policy optimization (CTPO), which employs the adjoint method to calculate the policy gradient, then implements optimization by gradient descent. The nature of CTPO is to minimize the integral of Hamiltonian over the time horizon to approach optimality, which fits the framework of Pontryagin's minimum principle. We further reveal that the intrinsic connection to its discrete-time counterpart lies in the different order of differentiation and discretization operations. Finally we conduct experiments on a linear quadratic regulator (LQR) and a nonlinear vehicle trajectory tracking task. The results demonstrate that the trained policy retains continuous-time system information and achieves high accuracy.
What problem does this paper attempt to address?