Abstract:The policy gradient adaptive dynamic programming (PGADP) technique has gained recognition as an effective approach for optimizing the performance of nonlinear systems. Nonetheless, existing PGADP algorithms often demand a substantial volume of expensive or potentially risky interaction data with the system. Moreover, the utilization of neural networks in these algorithms can result in suboptimal learning efficiency and unstable training procedures. To address these challenges, a novel algorithm, referred to as OptNet-PGADP, has been introduced. This algorithm integrates an initially tailored control policy based on OptNet to tackle the optimization of control problems in discrete-time nonlinear systems. The OptNet-PGADP algorithm operates through a two-step process. Initially, the input–output trajectory of the system is computed using the nonlinear model predictive control (NMPC) method. Subsequently, an initial admissible control policy is acquired through OptNet. This policy is iteratively enhanced using the PGADP algorithm to attain the optimal controller. The resulting closed-loop control policy can be readily deployed in real-time applications. The implementation of the algorithm employs OptNet for the actor network and integrates an experience replay mechanism to bolster the controller’s learning efficiency. Furthermore, a convergence and optimality analysis of the algorithm is included. Simulation and experimental results conducted on two nonlinear systems conclusively demonstrate that the approach outperforms traditional PGADP and NMPC algorithms. These findings underscore the efficacy of OptNet-PGADP in mitigating the constraints of current methods and achieving superior control performance for nonlinear systems.

Adaptive Optimal Control of Nonlinear Systems with Multiple Time-scale Eligibility Traces

Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces

Online Optimal Control of Discrete-Time Systems Based on Globalized Dual Heuristic Programming with Eligibility Traces

Twin Deterministic Policy Gradient Adaptive Dynamic Programming for Optimal Control of Affine Nonlinear Discrete-time Systems

Model-free Adaptive Dynamic Programming for Optimal Control of Discrete-time Affine Nonlinear System

Heuristic Dynamic Programming Strategy with Eligibility Traces

Adaptive Multi-Step Evaluation Design With Stability Guarantee for Discrete-Time Optimal Learning Control

Multi-step Heuristic Dynamic Programming for Optimal Control of Nonlinear Discrete-Time Systems.

Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems with Trajectory-Based Initial Control Policy

Nearly Finite-Horizon Optimal Control for A Class of Nonaffine Time-Delay Nonlinear Systems Based on Adaptive Dynamic Programming

Event-triggered optimal decentralized control for stochastic interconnected nonlinear systems via adaptive dynamic programming

Output Tracking Control Based on Adaptive Dynamic Programming with Multistep Policy Evaluation

A Parallel Framework of Adaptive Dynamic Programming Algorithm with Off-Policy Learning.

Multi-Objective Optimal Control for A Class of Nonlinear Time-Delay Systems Via Adaptive Dynamic Programming

Tracking Control of Affine Nonlinear Discrete-Time Systems Based on Gaussian-kernel-based ADP

Multiple Model Adaptive Tracking Control Based on Adaptive Dynamic Programming

Optimal tracking control of a class of nonlinear discrete-time switched systems using adaptive dynamic programming

Adaptive dynamic programming for optimal control of discrete‐time nonlinear system with state constraints based on control barrier function

Adaptive Dynamic Programming for Nonaffine Nonlinear Optimal Control Problem with State Constraints

Deterministic Policy Gradient Adaptive Dynamic Programming for Model-Free Optimal Control

Local Policy Iteration Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems