Abstract:In this article, a real-time online off-policy reinforcement learning (RL) method is developed for the optimal control problem of unknown continuous-time nonlinear systems. First, by applying the temporal difference technique to the iterative procedure of off-policy RL, the iterative value function and the iterative policy input can be learned in real-time online. It is proven that the fitting error of neural network (NN) weights is exponentially convergent in each iteration. Second, a model-free Hamilton–Jacobi–Bellman equation (MF-HJBE) is deduced by taking the limit of the iterative procedure of off-policy RL. In this manner, it not only eliminates system dynamics in the classical HJBE, but also vanishes the iteration index. By applying temporal difference to the MF-HJBE, a real-time online tuning rule is designed to learn the optimal value function and the optimal policy input. It is proven that the fitting error of NN weights caused by the real-time online tuning rule is exponentially convergent. Note that the two online tuning rules, the iterative one and the real-time one, use only current and previous state data extracted from system trajectories. Meanwhile, it is proven using the Lyapunov's direct method that the system solution is uniformly ultimately bounded. Finally, simulation results demonstrate the validity of the proffered method.

Reinforcement Learning-Based Direct Adaptive Optimal Control of JLQ Model

Reinforcement Learning-Based $\mathcal{h}_{\infty }$ Control of 2-D Markov Jump Roesser Systems with Optimal Disturbance Attenuation

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

Model-free optimal controller for discrete-time Markovian jump linear systems: A Q-learning approach

Optimization For Controlled Jump Rates Of Jlqg Problem

Optimal control for continuous-time Markov jump singularly perturbed systems : A hybrid reinforcement learning scheme

Fuzzy-Based Adaptive Optimization of Unknown Discrete-Time Nonlinear Markov Jump Systems With Off-Policy Reinforcement Learning

Online Reinforcement Learning-based Neural Network Controller Design for Affine Nonlinear Discrete-time Systems.

H∞$$ {h}_{\infty } $$ Optimal Output Tracking Control for Markov Jump Systems: A Reinforcement Learning‐based Approach

Learning Algorithm for LQG Model with Constrained Control

Reinforcement Learning-Based Control for Nonlinear Discrete-Time Systems with Unknown Control Directions and Control Constraints

Direct Optimization Based Compensation Adaptive Robust Control of Nonlinear Systems with State and Input Constraints

A Fuzzy-Model-Based Approach to Optimal Control for Nonlinear Markov Jump Singularly Perturbed Systems: A Novel Integral Reinforcement Learning Scheme

A Reinforcement Learning Method for LQR Control Problem

Reinforcement learning‐based composite suboptimal control for Markov jump singularly perturbed systems with unknown dynamics

Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks

Reinforcement Learning Controller Design for Affine Nonlinear Discrete-Time Systems Using Online Approximators

Online Adaptive Optimal Control Algorithm Based on Synchronous Integral Reinforcement Learning With Explorations

Reinforcement Learning for Adaptive Optimal Stationary Control of Linear Stochastic Systems

Online Adaptive Optimization Algorithm for Semi-Markov Control Processes

Average Cost Optimal Control of Stochastic Systems Using Reinforcement Learning