Abstract:In this article, the generalized N -step value gradient learning (GNSVGL) algorithm, which takes a long-term prediction parameter λ into account, is developed for infinite horizon discounted near-optimal control of discrete-time nonlinear systems. The proposed GNSVGL algorithm can accelerate the learning process of adaptive dynamic programming (ADP) and has a better performance by learning from more than one future reward. Compared with the traditional N -step value gradient learning (NSVGL) algorithm with zero initial functions, the proposed GNSVGL algorithm is initialized with positive definite functions. Considering different initial cost functions, the convergence analysis of the value-iteration-based algorithm is provided. The stability condition for the iterative control policy is established to determine the value of the iteration index, under which the control law can make the system asymptotically stable. Under such a condition, if the system is asymptotically stable at the current iteration, then the iterative control laws after this step are guaranteed to be stabilizing. Two critic neural networks and one action network are constructed to approximate the one-return costate function, the λ -return costate function, and the control law, respectively. It is emphasized that one-return and λ -return critic networks are combined to train the action neural network. Finally, via conducting simulation studies and comparisons, the superiority of the developed algorithm is confirmed.

Discrete-Time Stable Generalized Self-Learning Optimal Control With Approximation Errors.

Stable Generalized Predictive Control and Its Performance Analysis

Modified general policy iteration based adaptive dynamic programming for unknown discrete‐time linear systems

Approximate Finite-Horizon Optimal Control with Policy Iteration

Model-free Adaptive Dynamic Programming for Optimal Control of Discrete-time Affine Nonlinear System

Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems.

Learning Optimal Control Policy for Unknown Discrete-Time Systems

AN OPEN-CLOSED-LOOP PI-TYPE ITERATIVE LEARNING CONTROL SCHEME FOR DISCRETE NONLINEAR TIME-VARYING SYSTEMS AND ITS CONVERGENCE

Discrete-Time Adaptive Iterative Learning Control for High-Order Nonlinear Systems with Unknown Control Directions

Twin Deterministic Policy Gradient Adaptive Dynamic Programming for Optimal Control of Affine Nonlinear Discrete-time Systems

Error Bound Analysis of Q-Function for Discounted Optimal Control Problems With Policy Iteration.

Solving optimal predictor-feedback control using approximate dynamic programming

Iterative GDHP-based Approximate Optimal Tracking Control for a Class of Discrete-Time Nonlinear Systems

Approximate Policy Iteration for Robust Stochastic Control of Multi-agent Markov Decision Processes

Discrete-Time Self-Learning Parallel Control

Finite‐Horizon Ε‐optimal Tracking Control of Discrete‐Time Linear Systems Using Iterative Approximate Dynamic Programming

Adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

Data-driven iterative adaptive dynamic programming algorithm for approximate optimal control of unknown nonlinear systems

A New Approach to Finite-Horizon Optimal Control for Discrete-Time Affine Nonlinear Systems via a Pseudolinear Method

Convergence and Stability of Optimal Regulation via Generalized N-Step Value Gradient Learning

Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces