Abstract:H{\infty} control of nonlinear continuous-time system depends on the solution of the Hamilton-Jacobi-Isaacs (HJI) equation, which has been proved impossible to obtain a closed-form solution due to the nonlinearity of HJI equation. In order to solve HJI equation, many iterative algorithms were proposed, and most of the algorithms were essentially Newton method when the fixed-point equation was constructed in a Banach space. Newton method is a local optimization method, it has small convergence region and needs the initial guess to be sufficiently close to the solution. Whereas damped Newton method enhances the robustness with respect to initial condition and has larger convergence region. In this paper, a novel reinforcement learning method which is named {\alpha}-policy iteration ({\alpha}-PI) is introduced for solving HJI equation. First, by constructing a damped Newton iteration operator equation, a generalized Bellman equation (GBE) is obtained. The GBE is an extension of bellman equation. And then, by iterating on the GBE, an on-policy {\alpha}-PI reinforcement learning method without using knowledge regarding to the system internal dynamics is proposed. Third, based on the on-policy {\alpha}-PI reinforcement learning method, we develop an off-policy {\alpha}-PI reinforcement learning method without requiring any knowledge of the system dynamics. Finally, the neural-network based adaptive critic implementation schemes of on-policy and off-policy {\alpha}-PI algorithms are derived respectively, and the batch least-squares method is used for calculating the weight parameters of neural networks. The effectiveness of the off-policy {\alpha}-PI algorithm is verified through computer simulation.

Modified λ-Policy Iteration Based Adaptive Dynamic Programming for Unknown Discrete-Time Linear Systems

Modified $\lambda$-Policy Iteration Based Adaptive Dynamic Programming for Unknown Discrete-Time Linear Systems

Modified general policy iteration based adaptive dynamic programming for unknown discrete‐time linear systems

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

Model-free Adaptive Dynamic Programming for Optimal Control of Discrete-time Affine Nonlinear System

Bias-policy iteration based adaptive dynamic programming for unknown continuous-time linear systems

A Novel Policy Iteration Algorithm for Nonlinear Continuous-Time H$\infty$ Control Problem

Adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

On policy iteration‐based discounted optimal control

Model-Free $δ$-Policy Iteration Based on Damped Newton Method for Nonlinear Continuous-Time H$\infty$ Tracking Control

A New Continuous-Time Policy Iteration for Time-Varying Nonlinear Systems

Policy-Iteration-Based Finite-Horizon Approximate Dynamic Programming for Continuous-Time Nonlinear Optimal Control

Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms

Relaxed Policy Iteration Algorithm for Nonlinear Zero-Sum Games with Application to H-infinity Control

Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces

Scaling policy iteration based reinforcement learning for unknown discrete-time linear systems

Twin Deterministic Policy Gradient Adaptive Dynamic Programming for Optimal Control of Affine Nonlinear Discrete-time Systems

A New Approach to Finite-Horizon Optimal Control for Discrete-Time Affine Nonlinear Systems via a Pseudolinear Method

Adaptive Multi-Step Evaluation Design With Stability Guarantee for Discrete-Time Optimal Learning Control

Adaptive Dynamic Programming for Nonaffine Nonlinear Optimal Control Problem with State Constraints

Adaptive Optimal Control with Guaranteed Convergence Rate for Continuous-Time Linear Systems with Completely Unknown Dynamics.