Abstract:In this paper, a novel adaptive dynamic programming (ADP)-based optimal control method is developed for discrete-time systems subject to constraints and disturbances. Particularly, a safe policy iteration scheme is designed to handle state and input constraints, including both hard and soft constraints, by converting the original policy improvement strategy into a constrained optimization problem with a prescribed state cost function. After that, an actor-critic-disturbance framework is introduced to address the constrained optimal control problem. The robust safety against disturbances is treated as a two-player zero-sum game, where the actor and disturbance neural networks are used to approximate the optimal control input and the disturbance policy, respectively. The convergence property of the proposed algorithm is analyzed, and the multi-step version of the proposed ADP scheme is derived based on this property. Simulation results are demonstrated and discussed to validate the effectiveness and performance of the proposed method. Note to Practitioners—Addressing constraints in optimal control problems is essential for guaranteeing the safe operation of controlled systems. However, conventional ADP algorithms struggle to simultaneously manage state and control input constraints during the search for the optimal solution. In real-world applications, another critical and common issue is the presence of external disturbances, where disturbances that cause the control object to deviate from the safe region must be constrained while seeking an optimal control policy. Bearing these factors in mind, this study presents a novel ADP scheme for solving optimal control problems of discrete-time systems, taking into account state and control constraints as well as the impact of disturbances. Moreover, the convergence analysis of the proposed SADP scheme is provided, offering a powerful theoretical foundation for guaranteeing the safety and feasibility of the controlled system during operation.

Data-Based Optimal Switching and Control with Admissibility Guaranteed &Lt;inline-Formula> &Lt;tex-Math Notation="latex">$q$</tex-Math> &Lt;/inline-Formula>-learning

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

Approximately Optimal Control of Discrete-Time Nonlinear Switched Systems Using Globalized Dual Heuristic Programming

Twin Deterministic Policy Gradient Adaptive Dynamic Programming for Optimal Control of Affine Nonlinear Discrete-time Systems

Model-free Adaptive Dynamic Programming for Optimal Control of Discrete-time Affine Nonlinear System

Fuzzy Optimal Control for a Class of Discrete-Time Switched Nonlinear Systems

Data-Driven Event-Triggered Adaptive Dynamic Programming Control for Nonlinear Systems with Input Saturation.

Logic Switching Based Online Periodic Adaptive Learning Control Algorithm Dealing with Unknown Period and Bound of the Uncertain Parameter

A hybrid model-based optimal control method for nonlinear systems using simultaneous dynamic optimization strategies

A Combined Policy Gradient and Q-learning Method for Data-driven Optimal Control Problems

Performance-Guaranteed Fault-Tolerant Control for Uncertain Nonlinear Systems via Learning-Based Switching Scheme

Online Reinforcement Learning-based Neural Network Controller Design for Affine Nonlinear Discrete-time Systems.

Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces

Discrete-Time Self-Learning Parallel Control

ADP-Based Optimal Control for Discrete-Time Systems With Safe Constraints and Disturbances

Costate-Supplement ADP for Model-Free Optimal Control of Discrete-Time Nonlinear Systems

Adaptive dynamic programming for optimal control of discrete‐time nonlinear system with state constraints based on control barrier function

Adaptive autonomous soaring of multiple UAVs using Simultaneous Perturbation Stochastic Approximation

Online Adaptive Optimal Control for Continuous-Time Nonlinear Systems with Completely Unknown Dynamics.

Adaptive Multi-Step Evaluation Design With Stability Guarantee for Discrete-Time Optimal Learning Control

Adaptive Optimal Control via Q-Learning for Itô Fuzzy Stochastic Nonlinear Continuous-Time Systems With Stackelberg Game

Data-Based Optimal Switching and Control with Admissibility Guaranteed &Lt;inline-Formula&gt; &Lt;tex-Math Notation=&quot;latex&quot;&gt;$q$&lt;/tex-Math&gt; &Lt;/inline-Formula&gt;-learning