Safety Constrained Trajectory Optimization for Completion Time Minimization for UAV Communications

Tao Wang,Wenbo Du,Chunxiao Jiang,Yumeng Li,Haijun Zhang
DOI: https://doi.org/10.1109/jiot.2024.3355906
IF: 10.6
2024-01-01
IEEE Internet of Things Journal
Abstract:In recent years, unmanned aerial vehicles (UAVs) are considered to be integrated into wireless communication systems because of their tremendous advantages in mobility, cost, maneuverability, etc. In some real UAV-assisted communication scenarios, the dynamics of the environment, such as the roaming of served users, make it hard to obtain an optimal trajectory before the UAV is dispatched. Implanting an intelligent control policy into UAVs for distributed task execution is necessary to complete the task. In this paper, a UAV trajectory design problem is investigated for an orthorgonal-frequency-division-multiplexing (OFDM) wireless sensor network, which is dynamic because mobile sensors may randomly roam within a certain range. The UAV is expected to balance task efficiency with the safety constraint with a pre-trained onboard control policy. Compared to prior works, this work requires the policy to adapt to randomly generated obstacle maps, and also assumes that the UAV has no prior knowledge of the obstacles before it is dispatched, which brings about challenges to the problem. The motivation comes from adversarial environments without the specific obstacle distribution beforehand, such as a disaster area. The problem is formulated as a constrained Markov decision process (CMDP) model, which incorporates the safety constraint compared to basic MDP. Due to the assumption of randomized obstacle distribution and lack of prior knowledge, existing algorithms for CMDP can not be applied directly. To tackle this issue, we enhance reinforcement learning (RL) algorithm with a safety control mechanism to derive our novel safe reinforcement learning (Safe RL) algorithm, which is based on the framework of Lagrangian method. Compared to former algorithms about CMDP, our algorithm eliminates the premise that the safety model is known, the agent is able to learn safety judgement from scratch through its interactions with the environment. Simulation results demonstrate that our proposed algorithm outperforms the benchmark algorithm under the problem’s setup.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?