Abstract:This article investigates the optimally distributed consensus control problem for discrete-time multiagent systems with completely unknown dynamics and computational ability differences. The problem can be viewed as solving nonzero-sum games with distributed reinforcement learning (RL), and each agent is a player in these games. First, to guarantee the real-time performance of learning algorithms, a data-based distributed control algorithm is proposed for multiagent systems using offline system interaction data sets. By utilizing the interactive data produced during the run of a real-time system, the proposed algorithm improves system performance based on distributed policy gradient RL. The convergence and stability are guaranteed based on functional analysis and the Lyapunov method. Second, to address asynchronous learning caused by computational ability differences in multiagent systems, the proposed algorithm is extended to an asynchronous version in which executing policy improvement or not of each agent is independent of its neighbors. Furthermore, an actor-critic structure, which contains two neural networks, is developed to implement the proposed algorithm in synchronous and asynchronous cases. Based on the method of weighted residuals, the convergence and optimality of the neural networks are guaranteed by proving the approximation errors converge to zero. Finally, simulations are conducted to show the effectiveness of the proposed algorithm.

Adaptive Optimal Control of Discrete-Time Linear Systems with Discounted Value: Off-Policy Reinforcement Learning

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning

Off-Policy Risk-Sensitive Reinforcement Learning-Based Constrained Robust Optimal Control

Model-free Adaptive Dynamic Programming for Optimal Control of Discrete-time Affine Nonlinear System

Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks

Reinforcement Learning-Based Control for Nonlinear Discrete-Time Systems with Unknown Control Directions and Control Constraints

Fuzzy-Based Adaptive Optimization of Unknown Discrete-Time Nonlinear Markov Jump Systems With Off-Policy Reinforcement Learning

Learning Optimal Control Policy for Unknown Discrete-Time Systems

Reinforcement Learning for Adaptive Optimal Stationary Control of Linear Stochastic Systems

Reinforcement Learning for a Discrete-Time Linear-Quadratic Control Problem with an Application

Model-Based Safe Reinforcement Learning With Time-Varying Constraints: Applications to Intelligent Vehicles

Learning-based adaptive optimal control of linear time-delay systems: A value iteration approach

Adaptive Optimal Control for a Class of Continuous-Time Affine Nonlinear Systems with Unknown Internal Dynamics

Model-Based Safe Reinforcement Learning with Time-Varying State and Control Constraints: An Application to Intelligent Vehicles

Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees

Managing Temporal Resolution in Continuous Value Estimation: A Fundamental Trade-off

NN Reinforcement Learning Adaptive Control for a Class of Nonstrict-Feedback Discrete-Time Systems

Mixed Reinforcement Learning for Efficient Policy Optimization in Stochastic Environments

Off-Policy Reinforcement Learning for $ H_\infty $ Control Design

Data-Based Optimal Consensus Control for Multiagent Systems With Policy Gradient Reinforcement Learning