Abstract:Notwithstanding the successful design of state-of-the-art cooperative control protocols to accomplish formation tracking for multiple unmanned aerial vehicles (UAVs), the assurance of performance optimality cannot be guaranteed in the face of complex disturbances affecting these multi-UAV systems. In order to surmount this challenge, this research endeavor aims to establish a feedforward-feedback learning-based optimal control methodology to facilitate cooperative UAV formation tracking in the presence of intricate disturbances. To be more precise, by leveraging backstepping-based feedback control, the problem of UAV formation tracking is transformed into an equivalent optimal regulation problem. Consequently, a learning-based feedforward control scheme is devised, wherein the cooperative policy iteration algorithm is formulated based on a two-player zero-sum game. The critic-only echo state network (ESN) is employed to approximate the optimal feedforward control policies, with the inclusion of an online adaptive tuning law and compensation terms to alleviate the persistence of excitation condition and eliminate the need for an initial admissible control. As a result, the closed-loop stability is guaranteed in terms of uniformly ultimately boundedness for tracking errors and ESN weights. Note to Practitioners—In real-world scenarios, the flight of multiple UAVs is invariably affected by intricate disturbances, resulting in compromised tracking precision. There is an urgent need to enhance resistance to disturbances and ensure optimal performance for cooperative formation tracking of multiple UAVs. Beyond the capabilities of model-based controllers, the integration of reinforcement learning has shown promise in achieving robust control actions. By introducing the cooperative policy iteration algorithm based on a two-player zero-sum game, the tracking performances of UAV formation can be further optimized. In order to facilitate the practical application of reinforcement learning in UAV systems, our proposed algorithm addresses the persistency of excitation condition by incorporating innovative compensation terms into the ESN tuning law. Furthermore, we resolve the requirement for initial admissible control by introducing a novel piecewise compensation term into the ESN tuning law, which is based on a newly proposed Lyapunov function.

Robust Optimal Tracking Control for Multiplayer Systems by Off‐policy Q‐learning Approach

A Learning-Based Optimal Tracking Controller for Continuous Linear Systems with Unknown Dynamics: Theory and Case Study

Data-Efficient Off-Policy Learning for Distributed Optimal Tracking Control of HMAS with Unidentified Exosystem Dynamics.

Cooperative Path Following Control in Autonomous Vehicles Graphical Games: A Data-Based Off-Policy Learning Approach

Optimal Tracking Control of Nonlinear Multiagent Systems Using Internal Reinforce Q-Learning

Event‐triggered optimal tracking control of multiplayer unknown nonlinear systems via adaptive critic designs

Human-in-the-loop Distributed Cooperative Tracking Control with Applications to Autonomous Ground Vehicles: A Data-Driven Mixed Iteration Approach

Efficient off‐policy Q‐learning for multi‐agent systems by solving dual games

Optimal tracking cooperative control for multi-agent systems with periodic sampling via robust model predictive control approach

Adaptive Optimal Output-Feedback Consensus Tracking Control of Nonlinear Multiagent Systems Using Two-Player Stackelberg Game

Dynamic event-triggered robust optimal tracking control for multi-player nonzero-sum games with mismatched uncertainties and asymmetric constrained inputs

Online reinforcement learning control of unknown nonaffine nonlinear discrete time systems

Optimal Robust Online Tracking Control for Space Manipulator in Task Space Using Off-Policy Reinforcement Learning

Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method

Safe tracking in games: Achieving optimal control with unknown dynamics and constraints

Optimal trajectory tracking for uncertain linear discrete‐time systems using time‐varying Q‐learning

Learning-Based Optimal Cooperative Formation Tracking Control for Multiple UAVs: A Feedforward-Feedback Design Framework

Reinforcement learning for optimal tracking of large-scale systems with multitime scales

Model-Free Adaptive Optimal Control for Unknown Nonlinear Multiplayer Nonzero-Sum Game

Model-Free Optimal Tracking Design With Evolving Control Strategies via Q-Learning

Data-Driven H∞ Optimal Output Feedback Control for Linear Discrete-Time Systems Based on Off-Policy Q-Learning