Policy Optimization Adaptive Dynamic Programming for Optimal Control of Input-Affine Discrete-Time Nonlinear Systems.

Mingduo Lin,Bo Zhao
DOI: https://doi.org/10.1109/tsmc.2023.3247466
2023-01-01
IEEE Transactions on Systems Man and Cybernetics Systems
Abstract:In this article, a policy optimization adaptive dynamic programming (POADP) method is developed for optimal control of discrete-time unknown nonlinear systems, where the iterative control policy is parameterized to optimize the iterative $Q$ -function directly. The relaxed condition for the learning rate is given to guarantee the convergence of the present algorithm. Furthermore, the Polyak– ojasiewicz inequality is introduced to analyze the optimality, i.e., the iterative $Q$ -function converges to the optimum within a given computational threshold under a finite iteration, and the rate of convergence (i.e., the required minimum number of iterations) for the developed POADP method is also illustrated. To ease real implementations, the iterative $Q$ -function and the iterative control policy are approximated by employing an actor–critic structure. Then, an experiment-based method is developed to obtain the initial weights of actor–critic structure. Finally, numerical simulation results of two examples are provided to validate the effectiveness of the POADP algorithm.
What problem does this paper attempt to address?