Abstract:Reinforcement learning has proved to be a powerful tool to solve optimal control problems over the past few years. However, the data-based constrained optimal control problem of nonaffine nonlinear discrete-time systems has rarely been studied yet. To solve this problem, an adaptive optimal control approach is developed by using the value iteration-based Q-learning (VIQL) with the critic-only structure. Most of the existing constrained control methods require the use of a certain performance index and only suit for linear or affine nonlinear systems, which is unreasonable in practice. To overcome this problem, the system transformation is first introduced with the general performance index. Then, the constrained optimal control problem is converted to an unconstrained optimal control problem. By introducing the action-state value function, i.e., Q-function, the VIQL algorithm is proposed to learn the optimal Q-function of the data-based unconstrained optimal control problem. The convergence results of the VIQL algorithm are established with an easy-to-realize initial condition . To implement the VIQL algorithm, the critic-only structure is developed, where only one neural network is required to approximate the Q-function. The converged Q-function obtained from the critic-only VIQL method is employed to design the adaptive constrained optimal controller based on the gradient descent scheme. Finally, the effectiveness of the developed adaptive control method is tested on three examples with computer simulation.

Discrete-time optimal control scheme based on Q-learning algorithm

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

A Q-Learning Algorithm for Discrete-Time Linear-Quadratic Control with Random Parameters of Unknown Distribution: Convergence and Stabilization

Q-Learning for Linear Quadratic Optimal Control with Terminal State Constraint

Reinforcement Learning-Based Control for Nonlinear Discrete-Time Systems with Unknown Control Directions and Control Constraints

Discrete-time adaptive iterative learning control with unknown control directions

Learning Algorithm for LQG Model with Constrained Control

Discrete-Time Adaptive Iterative Learning Control for High-Order Nonlinear Systems with Unknown Control Directions

Model-Free Optimal Tracking Design With Evolving Control Strategies via Q-Learning

Adaptive Constrained Optimal Control Design for Data-Based Nonlinear Discrete-Time Systems With Critic-Only Structure

A Combined Policy Gradient and Q-learning Method for Data-driven Optimal Control Problems

AN OPEN-CLOSED-LOOP PI-TYPE ITERATIVE LEARNING CONTROL SCHEME FOR DISCRETE NONLINEAR TIME-VARYING SYSTEMS AND ITS CONVERGENCE

Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate

Reinforcement Learning-Based Control for a Class of Nonlinear Systems with Unknown Control Directions

Fuzzy Optimal Control for a Class of Discrete-Time Switched Nonlinear Systems

Optimal Tracking Control of Nonlinear Multiagent Systems Using Internal Reinforce Q-Learning

Reinforcement Learning-Based Direct Adaptive Optimal Control of JLQ Model

The Adaptive Optimal Output Feedback Tracking Control of Unknown Discrete-Time Linear Systems Using a Multistep Q-Learning Approach

A novel iterative learning control scheme based on Broyden‐class optimization method

Approximate Q-Learning for Controlled Diffusion Processes and its Near Optimality

Observation-based Optimal Control Law Learning with LQR Reconstruction