Abstract:H{\infty} control of nonlinear continuous-time system depends on the solution of the Hamilton-Jacobi-Isaacs (HJI) equation, which has been proved impossible to obtain a closed-form solution due to the nonlinearity of HJI equation. In order to solve HJI equation, many iterative algorithms were proposed, and most of the algorithms were essentially Newton method when the fixed-point equation was constructed in a Banach space. Newton method is a local optimization method, it has small convergence region and needs the initial guess to be sufficiently close to the solution. Whereas damped Newton method enhances the robustness with respect to initial condition and has larger convergence region. In this paper, a novel reinforcement learning method which is named {\alpha}-policy iteration ({\alpha}-PI) is introduced for solving HJI equation. First, by constructing a damped Newton iteration operator equation, a generalized Bellman equation (GBE) is obtained. The GBE is an extension of bellman equation. And then, by iterating on the GBE, an on-policy {\alpha}-PI reinforcement learning method without using knowledge regarding to the system internal dynamics is proposed. Third, based on the on-policy {\alpha}-PI reinforcement learning method, we develop an off-policy {\alpha}-PI reinforcement learning method without requiring any knowledge of the system dynamics. Finally, the neural-network based adaptive critic implementation schemes of on-policy and off-policy {\alpha}-PI algorithms are derived respectively, and the batch least-squares method is used for calculating the weight parameters of neural networks. The effectiveness of the off-policy {\alpha}-PI algorithm is verified through computer simulation.

Discrete-Time Nonlinear Generalized Policy Iteration for Optimal Control Using Neural Networks

Modified general policy iteration based adaptive dynamic programming for unknown discrete‐time linear systems

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

Model-free Adaptive Dynamic Programming for Optimal Control of Discrete-time Affine Nonlinear System

Fast Algorithm for Adaptive Generalized Predictive Control Based on Bp Neural Networks

Twin Deterministic Policy Gradient Adaptive Dynamic Programming for Optimal Control of Affine Nonlinear Discrete-time Systems

A Novel Policy Iteration Algorithm for Nonlinear Continuous-Time H$\infty$ Control Problem

Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces

A New Continuous-Time Policy Iteration for Time-Varying Nonlinear Systems

Adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

Modified λ-Policy Iteration Based Adaptive Dynamic Programming for Unknown Discrete-Time Linear Systems

Adaptive Dynamic Programming for Nonaffine Nonlinear Optimal Control Problem with State Constraints

Modified $\lambda$-Policy Iteration Based Adaptive Dynamic Programming for Unknown Discrete-Time Linear Systems

Policy-Iteration-Based Finite-Horizon Approximate Dynamic Programming for Continuous-Time Nonlinear Optimal Control

Approximate Finite-Horizon Optimal Control with Policy Iteration

A Combined Policy Gradient and Q-learning Method for Data-driven Optimal Control Problems

Continuous-Time Distributed Policy Iteration for Multicontroller Nonlinear Systems

Revisiting approximate dynamic programming and its convergence

Convergence and Stability of Optimal Regulation via Generalized N-Step Value Gradient Learning

On policy iteration‐based discounted optimal control

Parallel Cross Entropy Policy Gradient Adaptive Dynamic Programming for Optimal Tracking Control of Discrete-Time Nonlinear Systems