Abstract:Designing a safe and effective collision avoidance policy for multiple robots is essential in decentralized scenarios, where each robot is responsible for generating its own paths, to ensure their safe operation. Recently, the utilization of reinforcement learning to develop decentralized policies that enable multiple robots to move cooperatively and accomplish tasks has yielded positive outcomes. However, the presence of exploration unsafe actions during the reinforcement learning training process results in inadequate safety. We seek to enhance the safety of distributed multi-robot navigation policies and propose a new imitation learning framework based on the variational Bayesian model, which enables robots to learn safe actions by anticipating the subsequent state they are expected to reach. In addition, a new policy neural network structure for multi-robot navigation is proposed by introducing the transformer structure, which encodes the significance of nearby robots in relation to their forthcoming conditions. Experiments demonstrated that our policy can more safely guide robots to navigate in multi-robot environments under conditions of limited information, outperforming the state-of-the-art RL-RVO method in terms of success rate. Note to Practitioners —The motivation of this paper is to address the problem of collision avoidance in a multi-robot environment under limited information, which can also be applied to autonomous driving, crowd simulation, and other related fields. Positive outcomes have been observed in the utilization of reinforcement learning to create decentralized policies that enable multiple robots to move cooperatively and complete tasks. However, inadequate safety remains a challenging task due to the possibility of exploring hazardous actions during training. This article aims to enhance the safety of distributed policies guiding robots to accomplish navigation tasks in dynamic multi-robot environments. To begin with, we introduce a novel framework for imitation learning that is based on the variational Bayesian model. This framework facilitates the learning of safe actions by the policy to improve its performance and guide the robot in navigating and avoiding obstacles more securely. A loss function is proposed that enables the anticipation of the future state expected to be reached by the robot. By incorporating the transformer structure, a new neural network structure is designed for multi-robot navigation that encodes the significance of nearby robots concerning their upcoming conditions. This network structure employs a BiGRUs to facilitate the assimilation of observations from multiple agents by the policy. Compared to existing works such as GA3C-CADRL, SARL, and RL-RVO, our proposed method achieves a higher success rate. In our future research, we will investigate methods to enhance the policy’s performance in guiding robots to complete tasks by focusing on improving travel time and average speed, while also strictly ensuring safe navigation. Furthermore, we plan to extend this approach by addressing navigation challenges in more densely populated multi-robot environments.

Bayesian Reinforcement Learning for Multi-Robot Decentralized Patrolling in Uncertain Environments

Learning to Cooperate: Application of Deep Reinforcement Learning for Online AGV Path Finding.

Multi-Robot Learning Dynamic Obstacle Avoidance in Formation with Information-Directed Exploration.

Moving Forward in Formation: A Decentralized Hierarchical Learning Approach to Multi-Agent Moving Together

Balancing Efficiency and Unpredictability in Multi-robot Patrolling: A MARL-Based Approach.

An Energy-aware and Fault-tolerant Deep Reinforcement Learning based approach for Multi-agent Patrolling Problems

A deep reinforcement learning approach for multi-agent mobile robot patrolling

Multirobot Unknown Environment Exploration and Obstacle Avoidance Based on a Voronoi Diagram and Reinforcement Learning

Multi-Robot Stochastic Patrolling Via Graph Partitioning

Multi-Agent Reinforcement Learning-Based UAV Pathfinding for Obstacle Avoidance in Stochastic Environment

Toward Safe Distributed Multi-Robot Navigation Coupled with Variational Bayesian Model

Resilient and Adaptive Replanning for Multi-Robot Target Tracking with Sensing and Communication Danger Zones

Multi-Robot Patrol: A Distributed Algorithm Based On Expected Idleness

Multi-robot Social-aware Cooperative Planning in Pedestrian Environments Using Multi-agent Reinforcement Learning

Sensor-based Multi-agent Coverage Control with Spatial Separation in Unstructured Environments

Multi-Robot Informative Path Planning for Efficient Target Mapping using Deep Reinforcement Learning

Multi-robot social-aware cooperative planning in pedestrian environments using attention-based actor-critic

An Upper Confidence Bound for Simultaneous Exploration and Exploitation in Heterogeneous Multi-Robot Systems

Multi-Robot Patrol with Continuous Connectivity and Assessment of Base Station Situation Awareness

Multi-Robot Patrolling with Sensing Idleness and Data Delay Objectives

Fully Distributed Multi-Robot Collision Avoidance via Deep Reinforcement Learning for Safe and Efficient Navigation in Complex Scenarios