Collaborative Reinforcement Learning Based Unmanned Aerial Vehicle (UAV) Trajectory Design for 3D UAV Tracking

Yujiao Zhu,Mingzhe Chen,Sihua Wang,Ye Hu,Yuchen Liu,Changchuan Yin
2024-01-23
Abstract:In this paper, the problem of using one active unmanned aerial vehicle (UAV) and four passive UAVs to localize a 3D target UAV in real time is investigated. In the considered model, each passive UAV receives reflection signals from the target UAV, which are initially transmitted by the active UAV. The received reflection signals allow each passive UAV to estimate the signal transmission distance which will be transmitted to a base station (BS) for the estimation of the position of the target UAV. Due to the movement of the target UAV, each active/passive UAV must optimize its trajectory to continuously localize the target UAV. Meanwhile, since the accuracy of the distance estimation depends on the signal-to-noise ratio of the transmission signals, the active UAV must optimize its transmit power. This problem is formulated as an optimization problem whose goal is to jointly optimize the transmit power of the active UAV and trajectories of both active and passive UAVs so as to maximize the target UAV positioning accuracy. To solve this problem, a Z function decomposition based reinforcement learning (ZD-RL) method is proposed. Compared to value function decomposition based RL (VD-RL), the proposed method can find the probability distribution of the sum of future rewards to accurately estimate the expected value of the sum of future rewards thus finding better transmit power of the active UAV and trajectories for both active and passive UAVs and improving target UAV positioning accuracy. Simulation results show that the proposed ZD-RL method can reduce the positioning errors by up to 39.4% and 64.6%, compared to VD-RL and independent deep RL methods, respectively.
Multiagent Systems,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to locate a target unmanned aerial vehicle (UAV) in real - time in three - dimensional space, which is accomplished by the cooperation of one active UAV and four passive UAVs. Specifically, the active UAV transmits signals to the target UAV, and the target UAV reflects these signals to the four passive UAVs. Each passive UAV estimates the distance to the target UAV through the received reflected signals and sends this distance information to the base station (BS). The base station estimates the position of the target UAV according to this distance information. Due to the movement characteristics of the target UAV, each active / passive UAV needs to optimize its trajectory to continuously locate the target UAV. At the same time, considering that the accuracy of distance estimation depends on the signal - to - noise ratio (SNR) of signal transmission, the active UAV also needs to optimize its transmit power. Therefore, this problem is modeled as an optimization problem, aiming to jointly optimize the transmit power of the active UAV and the trajectories of the active / passive UAVs to maximize the accuracy of target UAV location. The paper proposes a Z - function - decomposition - based reinforcement learning (ZD - RL) method to solve this problem. Compared with the value - function - decomposition - based reinforcement learning (VD - RL) method, the ZD - RL method can find the probability distribution of the sum of future rewards, thereby more accurately estimating the expected value of the sum of future rewards, and then finding better transmit power of the active UAV and trajectories of the active / passive UAVs, improving the location accuracy of the target UAV. Simulation results show that the proposed ZD - RL method can reduce the location error by 39.4% and 64.6% respectively, compared with VD - RL and the independent deep reinforcement learning method.