Deep Reinforcement Learning Enables Joint Trajectory and Communication in Internet of Robotic Things
Ruyu Luo,Hui Tian,Wanli Ni,Julian Cheng,Kwang-Cheng Chen
DOI: https://doi.org/10.1109/twc.2024.3462450
IF: 10.4
2024-01-01
IEEE Transactions on Wireless Communications
Abstract:Internet of Robotic Things (IoRT) emphasizes the integrated robotic, artificial intelligence computing, and communication technologies, enabling more sophisticated operations and decision-making. As a crucial element of IoRT, mission-critical applications, such as industrial manufacturing and emergency services, impose stringent requirements on ultra-reliable and low-latency communication (URLLC). The paper focuses on addressing URLLC challenges in the context of IoRT, particularly when autonomous mobile robots (AMRs) coexist with static sensors. We prioritize safe and efficient AMRs’ travel through trajectory design and communication resource allocation in IoRT systems without the need of any prior knowledge. To enhance network connectivity and exploit diversity gains, we introduce the flexible decoding and free clustering as the next-generation multiple access technologies in spectrum-limited downlink IoRT system. Then, aiming at minimizing the decoding error probability and travel time, we formulate a long-term multi-objective optimization problem by jointly designing AMRs’ trajectory and communication resource. To accommodate the inherent dynamics and unpredictability in the IoRT system, we introduce a multi-agent actor-critic deep reinforcement learning (DRL) framework, offering four distinct implementations, each accompanied by comprehensive complexity analyses. Simulation results reveal the following insights: 1) in terms of DRL implementations, off-policy algorithms with deterministic policies outperform their on-policy counterparts, achieving approximately a 67% increase in rewards; 2) In terms of communication schemes, our proposed flexible decoding and free clustering strategies under designed trajectories can effectively reduce decoding errors; 3) In terms of algorithm optimality, our DRL framework shows superior flexibility and adaptability in communication environments compared to traditional A* search and heuristic methods.