Abstract:In this article, we consider an iterative adaptive dynamic programming (ADP) algorithm within the Hamiltonian-driven framework to solve the Hamilton-Jacobi-Bellman (HJB) equation for the infinite-horizon optimal control problem in continuous time for nonlinear systems. First, a novel function, "min-Hamiltonian," is defined to capture the fundamental properties of the classical Hamiltonian. It is shown that both the HJB equation and the policy iteration (PI) algorithm can be formulated in terms of the min-Hamiltonian within the Hamiltonian-driven framework. Moreover, we develop an iterative ADP algorithm that takes into consideration the approximation errors during the policy evaluation step. We then derive a sufficient condition on the iterative value gradient to guarantee closed-loop stability of the equilibrium point as well as convergence to the optimal value. A model-free extension based on an off-policy reinforcement learning (RL) technique is also provided. Finally, numerical results illustrate the efficacy of the proposed framework.

What problem does this paper attempt to address?

This paper attempts to address the problem of solving the Hamilton-Jacobi-Bellman (HJB) equation in infinite-time optimal control problems for continuous-time nonlinear systems using an iterative adaptive dynamic programming (ADP) algorithm. Specifically, the paper proposes a new function, the "minimum Hamiltonian," to capture the essential properties of the classical Hamiltonian and constructs an iterative ADP algorithm based on this function. Additionally, the algorithm considers approximation errors in the policy evaluation step, ensuring the stability of the closed-loop system and convergence to the optimal value. The paper also provides a model-free extension method based on off-policy reinforcement learning (RL) techniques to avoid the complete requirement of system dynamics. ### Main Contributions of the Paper: 1. **Definition of "Minimum Hamiltonian"**: Unifies the HJB equation and policy iteration (PI) algorithm, allowing the use of quasi-Newton methods to iteratively solve the HJB equation. 2. **Analysis of Approximation Error Impact**: Investigates the dependence of closed-loop stability and performance guarantees on the approximation residuals generated by imprecise policy evaluation and derives a sufficient condition to ensure the convergence and closed-loop stability of the iterative learning algorithm. 3. **Robustness**: The proposed imprecise method is robust to bounded approximation errors in terms of stability and performance improvement. 4. **Special Case for Linear Systems**: For linear systems, the iterative ADP algorithm simplifies to the Newton-Kleinman iteration. ### Structure of the Paper: - **Part II**: Problem statement, defining continuous-time nonlinear systems and their cost functions. - **Part III**: Review of the Hamiltonian-driven framework for exact ADP. - **Part IV**: Extension of the exact Hamiltonian-driven ADP to the imprecise case. - **Part VI**: Case studies of linear and nonlinear dynamical systems. - **Part VII**: Conclusion and future research directions. ### Key Concepts: - **Hamiltonian**: An important tool for evaluating any feasible policy. - **Minimum Hamiltonian**: Defined as \( h(x, p) = -\frac{1}{4} p^T g(x) R^{-1} g^T(x) p + p^T f(x) + Q(x) \). - **Policy Evaluation**: Calculation of the cost for a given policy. - **Policy Comparison**: Comparison of the performance of two different feasible policies. - **Policy Improvement**: Design of an improved policy based on the current feasible policy. ### Main Results: - **Convergence**: Under certain conditions, the iterative ADP algorithm can monotonically converge to the optimal value function. - **Stability**: The iterative policy can stabilize the closed-loop system. - **Handling Approximation Errors**: The proposed algorithm can ensure convergence and stability in the presence of approximation errors. In summary, this paper introduces a new iterative ADP algorithm by incorporating the "minimum Hamiltonian" and considering approximation errors, addressing the optimal control problem in nonlinear systems, and providing theoretical and numerical validation.

Hamiltonian-Driven Adaptive Dynamic Programming With Approximation Errors

Hamiltonian-Driven Adaptive Dynamic Programming With Efficient Experience Replay

Policy-Iteration-Based Finite-Horizon Approximate Dynamic Programming for Continuous-Time Nonlinear Optimal Control

Model-free Adaptive Dynamic Programming for Optimal Control of Discrete-time Affine Nonlinear System

Model‐free optimal tracking over finite horizon using adaptive dynamic programming

Approximate dynamic programming for continuous state and control problems

Revisiting approximate dynamic programming and its convergence

Continuous-time finite-horizon ADP for automated vehicle controller design with high efficiency

Approximate Dynamic Programming for Constrained Piecewise Affine Systems with Stability and Safety Guarantees

Robust Approximate Dynamic Programming for Nonlinear Systems With Both Model Error and External Disturbance

Adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

Twin Deterministic Policy Gradient Adaptive Dynamic Programming for Optimal Control of Affine Nonlinear Discrete-time Systems

Solving Finite-Horizon HJB for Optimal Control of Continuous-Time Systems

Model-Free Incremental Adaptive Dynamic Programming Based Approximate Robust Optimal Regulation

Approximate Finite-Horizon Optimal Control with Policy Iteration

Adaptive Multi-Step Evaluation Design With Stability Guarantee for Discrete-Time Optimal Learning Control

Theoretical and Numerical Analysis of Approximate Dynamic Programming with Approximation Errors

Intelligent Optimal Control of Constrained Nonlinear Systems Via Receding-Horizon Heuristic Dynamic Programming

Approximately Optimal Control of Discrete-Time Nonlinear Switched Systems Using Globalized Dual Heuristic Programming

Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems.

H ∞ optimal control of unknown linear systems by adaptive dynamic programming with applications to time‐delay systems