Abstract:H{\infty} control of nonlinear continuous-time system depends on the solution of the Hamilton-Jacobi-Isaacs (HJI) equation, which has been proved impossible to obtain a closed-form solution due to the nonlinearity of HJI equation. In order to solve HJI equation, many iterative algorithms were proposed, and most of the algorithms were essentially Newton method when the fixed-point equation was constructed in a Banach space. Newton method is a local optimization method, it has small convergence region and needs the initial guess to be sufficiently close to the solution. Whereas damped Newton method enhances the robustness with respect to initial condition and has larger convergence region. In this paper, a novel reinforcement learning method which is named {\alpha}-policy iteration ({\alpha}-PI) is introduced for solving HJI equation. First, by constructing a damped Newton iteration operator equation, a generalized Bellman equation (GBE) is obtained. The GBE is an extension of bellman equation. And then, by iterating on the GBE, an on-policy {\alpha}-PI reinforcement learning method without using knowledge regarding to the system internal dynamics is proposed. Third, based on the on-policy {\alpha}-PI reinforcement learning method, we develop an off-policy {\alpha}-PI reinforcement learning method without requiring any knowledge of the system dynamics. Finally, the neural-network based adaptive critic implementation schemes of on-policy and off-policy {\alpha}-PI algorithms are derived respectively, and the batch least-squares method is used for calculating the weight parameters of neural networks. The effectiveness of the off-policy {\alpha}-PI algorithm is verified through computer simulation.

Neuro-Control for Continuous-Time Stochastic Nonlinear Systems Via Online Policy Iteration Algorithm

A nonlinear predictive control algorithm based on fuzzy online modeling and discrete optimization

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks

Online Reinforcement Learning-based Neural Network Controller Design for Affine Nonlinear Discrete-time Systems.

Online Adaptive Optimal Control for Continuous-Time Nonlinear Systems with Completely Unknown Dynamics.

Online reinforcement learning control of unknown nonaffine nonlinear discrete time systems

Near Optimal Neural Network-based Output Feedback Control of Affine Nonlinear Discrete-Time Systems

Continuous-Time Distributed Policy Iteration for Multicontroller Nonlinear Systems

Neural Stochastic Control

Neural network-based finite-horizon optimal control of uncertain affine nonlinear discrete-time systems

A New Continuous-Time Policy Iteration for Time-Varying Nonlinear Systems

Distributed Optimal Control of Nonlinear System Based on Policy Gradient with External Disturbance

A policy iteration algorithm for non-Markovian control problems

Decentralized Adaptive Neural Inverse Optimal Control of Nonlinear Interconnected Systems

A Novel Policy Iteration Algorithm for Nonlinear Continuous-Time H$\infty$ Control Problem

Optimal control for continuous-time Markov jump singularly perturbed systems : A hybrid reinforcement learning scheme

Online Non-stochastic Control with Partial Feedback

Nearly optimal stabilization of unknown continuous-time nonlinear systems: A new parallel control approach

Policy Iteration Based Feedback Control

Observer-Based Adaptive Optimized Control for Stochastic Nonlinear Systems With Input and State Constraints