Deep Reinforcement Learning with Multi-Critic TD3 for Decentralized Multi-Robot Path Planning

Heqing Yin,Chang Wang,Chao Yan,Xiaojia Xiang,Boliang Cai,Changyun Wei
DOI: https://doi.org/10.1109/tcds.2024.3368055
IF: 4.546
2024-01-01
IEEE Transactions on Cognitive and Developmental Systems
Abstract:Centralized multi-robot path planning is a prevalent approach involving a global planner computing feasible paths for each robot using shared information. Nonetheless, this approach encounters limitations due to communication constraints and computational complexity. To address these challenges, we introduce a novel decentralized multi-robot path planning approach that eliminates the need for sharing the states and intentions of robots. Our approach harnesses deep reinforcement learning and features an asynchronous multi-critic twin delayed deep deterministic policy gradient (AMC-TD3) algorithm, which enhances the original GRU-Attention based TD3 algorithm by incorporating a multi-critic network and employing an asynchronous training mechanism. By training each critic with a unique reward function, our learned policy enables each robot to navigate towards its long-term objective without colliding with other robots in complex environments. Furthermore, our reward function, grounded in social norms, allows the robots to naturally avoid each other in congested situations. Specifically, we train three critics to encourage each robot to achieve its long-term navigation goal, maintain its moving direction, and prevent collisions with other robots. Our model can learn an end-to-end navigation policy without relying on an accurate map or any localization information, rendering it highly adaptable to various environments. Simulation results reveal that our proposed approach surpasses baselines in several environments with different levels of complexity and robot populations.
robotics,computer science, artificial intelligence,neurosciences
What problem does this paper attempt to address?