Reinforcement Learning-Based Dynamic Coverage Control of Multi-Rotor UAVs with Safety Priority
Zhuangzhuang Ma,Junjie You,Yunlin Zhang,Yuhua Cheng,Jinliang Shao
DOI: https://doi.org/10.1109/tase.2024.3420094
IF: 6.636
2024-01-01
IEEE Transactions on Automation Science and Engineering
Abstract:This paper considers the dynamic coverage control of multi-rotor Unmanned Aerial Vehicles (UAVs) with the limited sensory range, which aims to collect sensor information from all points of interest in the given task area until the desired prescribed level is reached. However, the unknown environments are usually unavoidable for coverage task, where the presence of various obstacles and communication interferences affect the flight safety and communication stability of UAVs. Therefore, collision avoidance and connectivity maintenance are considered as the two safety issues in this paper, in which connectivity maintenance ensures the communication environment for UAVs to collaboratively accomplish task, and collision avoidance is used for UAVs to avoid obstacles and neighbors. In order to realize dynamic coverage control with safety constraints based on local environment information, this paper proposes the reinforcement learning-based algorithm with shield, where the shield designed by discrete-time Control Barrier Function (CBF) not only ensures the safety of the UAVs in the learning and control phases, but also maximizes the coverage performance of UAVs. In addition, each UAV only relies on local information to generate safe actions for advancing the coverage process during the execution phase. Finally, the effectiveness of the algorithm is verified by numerical simulations and physical experiment. Note to Practitioners —A typical application scenario of dynamic coverage control is search and rescue (SAR), in which UAVs equipped with multiple sensors focus on monitoring areas where trapped people may be present, e.g., anomalous areas detected by infrared sensors due to human body temperature. Since SAR always occurs in unknown environments, it is crucial to ensure the safety of UAVs during missions, of which the safety issues considered in this paper include collision avoidance and connectivity maintenance. To perform the SAR mission safely, we construct the CBF-based shield, which minimizes corrections the exploration actions of the UAVs and ensures the safety of the cluster during the mission. In addition, UAVs are difficult to obtain global environmental information in unknown environments and only rely on their sensors to collect local information. Therefore, the reinforcement learning algorithm with shield proposed in this paper adopts the centralized training and decentralized execution strategy, where the UAVs only need local observation information to plan their next actions. Physical experiments were also conducted to validate the feasibility of implementing the proposed algorithm using real UAVs.