Abstract:The policy gradient adaptive dynamic programming (PGADP) technique has gained recognition as an effective approach for optimizing the performance of nonlinear systems. Nonetheless, existing PGADP algorithms often demand a substantial volume of expensive or potentially risky interaction data with the system. Moreover, the utilization of neural networks in these algorithms can result in suboptimal learning efficiency and unstable training procedures. To address these challenges, a novel algorithm, referred to as OptNet-PGADP, has been introduced. This algorithm integrates an initially tailored control policy based on OptNet to tackle the optimization of control problems in discrete-time nonlinear systems. The OptNet-PGADP algorithm operates through a two-step process. Initially, the input–output trajectory of the system is computed using the nonlinear model predictive control (NMPC) method. Subsequently, an initial admissible control policy is acquired through OptNet. This policy is iteratively enhanced using the PGADP algorithm to attain the optimal controller. The resulting closed-loop control policy can be readily deployed in real-time applications. The implementation of the algorithm employs OptNet for the actor network and integrates an experience replay mechanism to bolster the controller’s learning efficiency. Furthermore, a convergence and optimality analysis of the algorithm is included. Simulation and experimental results conducted on two nonlinear systems conclusively demonstrate that the approach outperforms traditional PGADP and NMPC algorithms. These findings underscore the efficacy of OptNet-PGADP in mitigating the constraints of current methods and achieving superior control performance for nonlinear systems.

Policy Gradient Adaptive Critic Designs for Model-Free Optimal Tracking Control With Experience Replay

Parallel Cross Entropy Policy Gradient Adaptive Dynamic Programming for Optimal Tracking Control of Discrete-Time Nonlinear Systems

A Combined Policy Gradient and Q-learning Method for Data-driven Optimal Control Problems

Training Efficient Controllers via Analytic Policy Gradient

Adaptive critic-based tracking control of non-affine nonlinear discrete-time systems with unknown dynamics

Near Optimal Neural Network-based Output Feedback Control of Affine Nonlinear Discrete-Time Systems

Data-Efficient Off-Policy Learning for Distributed Optimal Tracking Control of HMAS with Unidentified Exosystem Dynamics.

Control of Nonaffine Nonlinear Discrete-Time Systems Using Reinforcement-Learning-Based Linearly Parameterized Neural Networks

Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems with Trajectory-Based Initial Control Policy

Model-free Adaptive Dynamic Programming for Optimal Control of Discrete-time Affine Nonlinear System

Twin Deterministic Policy Gradient Adaptive Dynamic Programming for Optimal Control of Affine Nonlinear Discrete-time Systems

Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method With Convergence Analysis

Online Reinforcement Learning-based Neural Network Controller Design for Affine Nonlinear Discrete-time Systems.

Observer-Based Event-Triggered Tracking Control for Discrete-Time Nonlinear Systems Using Adaptive Critic Design

Adaptive Learning-Based Path-Tracking Control for Unknown Vehicle Systems under Performance Optimization

Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces

Event‐triggered optimal tracking control of multiplayer unknown nonlinear systems via adaptive critic designs

Policy Gradient-based Model Free Optimal LQG Control with a Probabilistic Risk Constraint

Periodic event-triggered adaptive tracking control design for nonlinear discrete-time systems via reinforcement learning

Adaptive Neural Event-Triggered Optimal Tracking Control for Discrete-Time Pure-Feedback Systems

Policy Gradient Adaptive Dynamic Programming for Model-Free Multi-Objective Optimal Control