Abstract:This paper presents a {\delta}-PI algorithm which is based on damped Newton method for the H{\infty} tracking control problem of unknown continuous-time nonlinear system. A discounted performance function and an augmented system are used to get the tracking Hamilton-Jacobi-Isaac (HJI) equation. Tracking HJI equation is a nonlinear partial differential equation, traditional reinforcement learning methods for solving the tracking HJI equation are mostly based on the Newton method, which usually only satisfies local convergence and needs a good initial guess. Based upon the damped Newton iteration operator equation, a generalized tracking Bellman equation is derived firstly. The {\delta}-PI algorithm can seek the optimal solution of the tracking HJI equation by iteratively solving the generalized tracking Bellman equation. On-policy learning and off-policy learning {\delta}-PI reinforcement learning methods are provided, respectively. Off-policy version {\delta}-PI algorithm is a model-free algorithm which can be performed without making use of a priori knowledge of the system dynamics. NN-based implementation scheme for the off-policy {\delta}-PI algorithms is shown. The suitability of the model-free {\delta}-PI algorithm is illustrated with a nonlinear system simulation.

On policy iteration‐based discounted optimal control

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

Modified λ-Policy Iteration Based Adaptive Dynamic Programming for Unknown Discrete-Time Linear Systems

Extended State Space Predictive Control for a Class of Nonlinear Systems

Policy iteration for discrete-time systems with discounted costs: stability and near-optimality guarantees

An Improved PI Controller for Stiction Compensation of Control Valves in Process Industry

Policy Iteration Based Feedback Control

Modified $\lambda$-Policy Iteration Based Adaptive Dynamic Programming for Unknown Discrete-Time Linear Systems

Iterative design of suboptimal feedback control for bilinear parabolic PDE systems

Modified general policy iteration based adaptive dynamic programming for unknown discrete‐time linear systems

Model-Free $δ$-Policy Iteration Based on Damped Newton Method for Nonlinear Continuous-Time H$\infty$ Tracking Control

Continuous-Time Distributed Policy Iteration for Multicontroller Nonlinear Systems

AN OPEN-CLOSED-LOOP PI-TYPE ITERATIVE LEARNING CONTROL SCHEME FOR DISCRETE NONLINEAR TIME-VARYING SYSTEMS AND ITS CONVERGENCE

Policy Iteration-Based Learning Design for Linear Continuous-Time Systems Under Initial Stabilizing OPFB Policy

A New Continuous-Time Policy Iteration for Time-Varying Nonlinear Systems

On Convergence Analysis of Policy Iteration Algorithms for Entropy-Regularized Stochastic Control Problems

Model-free Adaptive Dynamic Programming for Optimal Control of Discrete-time Affine Nonlinear System

A policy iteration algorithm for non-Markovian control problems

Policy-Iteration-Based Finite-Horizon Approximate Dynamic Programming for Continuous-Time Nonlinear Optimal Control

Bias-policy iteration based adaptive dynamic programming for unknown continuous-time linear systems