Hamiltonian-Driven Adaptive Dynamic Programming With Approximation Errors

Yongliang Yang,Hamidreza Modares,Kyriakos G. Vamvoudakis,Wei He,Cheng-Zhong Xu,Donald C. Wunsch
DOI: https://doi.org/10.1109/tcyb.2021.3108034
IF: 11.8
2021-01-01
IEEE Transactions on Cybernetics
Abstract:In this article, we consider an iterative adaptive dynamic programming (ADP) algorithm within the Hamiltonian-driven framework to solve the Hamilton-Jacobi-Bellman (HJB) equation for the infinite-horizon optimal control problem in continuous time for nonlinear systems. First, a novel function, "min-Hamiltonian," is defined to capture the fundamental properties of the classical Hamiltonian. It is shown that both the HJB equation and the policy iteration (PI) algorithm can be formulated in terms of the min-Hamiltonian within the Hamiltonian-driven framework. Moreover, we develop an iterative ADP algorithm that takes into consideration the approximation errors during the policy evaluation step. We then derive a sufficient condition on the iterative value gradient to guarantee closed-loop stability of the equilibrium point as well as convergence to the optimal value. A model-free extension based on an off-policy reinforcement learning (RL) technique is also provided. Finally, numerical results illustrate the efficacy of the proposed framework.
automation & control systems,computer science, cybernetics, artificial intelligence
What problem does this paper attempt to address?
This paper attempts to address the problem of solving the Hamilton-Jacobi-Bellman (HJB) equation in infinite-time optimal control problems for continuous-time nonlinear systems using an iterative adaptive dynamic programming (ADP) algorithm. Specifically, the paper proposes a new function, the "minimum Hamiltonian," to capture the essential properties of the classical Hamiltonian and constructs an iterative ADP algorithm based on this function. Additionally, the algorithm considers approximation errors in the policy evaluation step, ensuring the stability of the closed-loop system and convergence to the optimal value. The paper also provides a model-free extension method based on off-policy reinforcement learning (RL) techniques to avoid the complete requirement of system dynamics. ### Main Contributions of the Paper: 1. **Definition of "Minimum Hamiltonian"**: Unifies the HJB equation and policy iteration (PI) algorithm, allowing the use of quasi-Newton methods to iteratively solve the HJB equation. 2. **Analysis of Approximation Error Impact**: Investigates the dependence of closed-loop stability and performance guarantees on the approximation residuals generated by imprecise policy evaluation and derives a sufficient condition to ensure the convergence and closed-loop stability of the iterative learning algorithm. 3. **Robustness**: The proposed imprecise method is robust to bounded approximation errors in terms of stability and performance improvement. 4. **Special Case for Linear Systems**: For linear systems, the iterative ADP algorithm simplifies to the Newton-Kleinman iteration. ### Structure of the Paper: - **Part II**: Problem statement, defining continuous-time nonlinear systems and their cost functions. - **Part III**: Review of the Hamiltonian-driven framework for exact ADP. - **Part IV**: Extension of the exact Hamiltonian-driven ADP to the imprecise case. - **Part VI**: Case studies of linear and nonlinear dynamical systems. - **Part VII**: Conclusion and future research directions. ### Key Concepts: - **Hamiltonian**: An important tool for evaluating any feasible policy. - **Minimum Hamiltonian**: Defined as \( h(x, p) = -\frac{1}{4} p^T g(x) R^{-1} g^T(x) p + p^T f(x) + Q(x) \). - **Policy Evaluation**: Calculation of the cost for a given policy. - **Policy Comparison**: Comparison of the performance of two different feasible policies. - **Policy Improvement**: Design of an improved policy based on the current feasible policy. ### Main Results: - **Convergence**: Under certain conditions, the iterative ADP algorithm can monotonically converge to the optimal value function. - **Stability**: The iterative policy can stabilize the closed-loop system. - **Handling Approximation Errors**: The proposed algorithm can ensure convergence and stability in the presence of approximation errors. In summary, this paper introduces a new iterative ADP algorithm by incorporating the "minimum Hamiltonian" and considering approximation errors, addressing the optimal control problem in nonlinear systems, and providing theoretical and numerical validation.