Abstract:Unmanned aerial vehicle (UAV) swarms have found extensive applications in various fields,playing a crucial role in cluster collaboration. These swarms involve multiple UAVs that work together to achieve common objectives. A key challenging task in swarm operations is collision-free formation control of UAVs. To solve this problem,applying deep reinforcement learning methods has received significant attention,but their application on autonomous UAVs poses challenges,including dependency on global information during training,difficulties in sampling,and excessive resource utilization. To overcome these challenges,in this work,a novel approach based on multi-agent deep reinforcement learning (MARL) is proposed for collision-free formation control of UAV swarms. MARL allows each UAV to interact with a dynamic environment that includes other UAVs,enabling collaborative decision-making and adaptive behavior. We focus on leveraging local information to establish a state space for individual UAVs. To train the policy network,we employ the multi-agent proximal policy optimization (MAPPO) algorithm,allowing robust learning and policy optimization in a multi-agent setting. Also,we address the issues of sampling difficulties and resource constraints by utilizing digital twin technology,serving as a bridge between physical entities and virtual models,which offers a novel approach to the intelligent collaborative control of drone swarms. By establishing models in virtual space,digital twin technology enables the simulation of real-world spaces for pre-training the reinforcement learning algorithm by generating synthetic experiences. We construct multiple digital twin environments to facilitate interactive sampling and pre-train the swarm with basic task capabilities. Then,we supplement the training using real-world data collected in actual environments,enhancing the ability of the swarm to perform optimally in real-world scenarios. To evaluate the effectiveness of our approach,we compare the performance of the two-stage training architecture with other policy algorithms. To validate the sample efficiency of the on-policy algorithm MAPPO,we conducted a comparative analysis with other policy algorithms,particularly off-policy algorithms. The results reveal the superior sample efficiency and stability of MAPPO in addressing the challenges of collision-free formation control. Finally,we conduct a real-flight validation test to validate the practicality and reliability of the strategy model derived from the digital twin environments. Overall,this work demonstrates the effectiveness of our proposed approach in enabling UAV swarms to navigate complex environments and achieve collision-free formation control.

Multi-UAV Adaptive Cooperative Formation Trajectory Planning Based on an Improved MATD3 Algorithm of Deep Reinforcement Learning

Multi-UAV Path Planning Based on Potential Field Dense Reward in Unknown Environments with Static and Dynamic Obstacles

Cooperative Path Planning Method for Unmanned Aerial Vehicle Formation Using Adaptive Primal-Dual Iteration

Multi-UAV simultaneous target assignment and path planning based on deep reinforcement learning in dynamic multiple obstacles environments

Underwater Multi-agent Cooperative Formation Hunting Based on Deep Reinforcement Learning

Deep Reinforcement Learning Approach with Multiple Experience Pools for UAV's Autonomous Motion Planning in Complex Unknown Environments

Novel task decomposed multi-agent twin delayed deep deterministic policy gradient algorithm for multi-UAV autonomous path planning

Digital Twin-Based Obstacle Avoidance Method for Unmanned Aerial Vehicle Formation Control Using Deep Reinforcement Learning

Deep Reinforcement Learning-based Collaborative Multi-UAV Coverage Path Planning

Local Trajectory Planning of Unmanned Aerial Vehicle Formation Based on Time Cooperative Strategy

Three-Dimension Trajectory Design for Multi-UAV Wireless Network With Deep Reinforcement Learning

AoI Optimal Trajectory Planning for Cooperative UAVs: A Multi-Agent Deep Reinforcement Learning Approach

Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning

A Four-Dimensional Space-Time Automatic Obstacle Avoidance Trajectory Planning Method for Multi-UAV Cooperative Formation Flight

An Attention Mechanism and Adaptive Accuracy Triple-Dependent MADDPG Formation Control Method for Hybrid UAVs

Multi-UAV Cooperative Trajectory Planning Based on the Modified Cheetah Optimization Algorithm

Multi-UAV Behavior-based Formation with Static and Dynamic Obstacles Avoidance via Reinforcement Learning

Deep Reinforcement Learning for Joint Trajectory Planning, Transmission Scheduling, and Access Control in UAV-Assisted Wireless Sensor Networks

Path Planning of Unmanned Aerial Vehicle in Complex Environments Based on State-Detection Twin Delayed Deep Deterministic Policy Gradient

Reinforcement-Learning-Based Multi-UAV Cooperative Search for Moving Targets in 3D Scenarios

Dual-UAVs Maneuvering Strategy Generation Algorithm Based on Cooperative Reward Mechanism and MATD3