Toward Safe Distributed Multi-Robot Navigation Coupled with Variational Bayesian Model
Lin Chen,Yaonan Wang,Zhiqiang Miao,Mingtao Feng,Zhen Zhou,Hesheng Wang,Danwei Wang
DOI: https://doi.org/10.1109/tase.2023.3346049
2024-01-01
Abstract:Designing a safe and effective collision avoidance policy for multiple robots is essential in decentralized scenarios, where each robot is responsible for generating its own paths, to ensure their safe operation. Recently, the utilization of reinforcement learning to develop decentralized policies that enable multiple robots to move cooperatively and accomplish tasks has yielded positive outcomes. However, the presence of exploration unsafe actions during the reinforcement learning training process results in inadequate safety. We seek to enhance the safety of distributed multi-robot navigation policies and propose a new imitation learning framework based on the variational Bayesian model, which enables robots to learn safe actions by anticipating the subsequent state they are expected to reach. In addition, a new policy neural network structure for multi-robot navigation is proposed by introducing the transformer structure, which encodes the significance of nearby robots in relation to their forthcoming conditions. Experiments demonstrated that our policy can more safely guide robots to navigate in multi-robot environments under conditions of limited information, outperforming the state-of-the-art RL-RVO method in terms of success rate. Note to Practitioners —The motivation of this paper is to address the problem of collision avoidance in a multi-robot environment under limited information, which can also be applied to autonomous driving, crowd simulation, and other related fields. Positive outcomes have been observed in the utilization of reinforcement learning to create decentralized policies that enable multiple robots to move cooperatively and complete tasks. However, inadequate safety remains a challenging task due to the possibility of exploring hazardous actions during training. This article aims to enhance the safety of distributed policies guiding robots to accomplish navigation tasks in dynamic multi-robot environments. To begin with, we introduce a novel framework for imitation learning that is based on the variational Bayesian model. This framework facilitates the learning of safe actions by the policy to improve its performance and guide the robot in navigating and avoiding obstacles more securely. A loss function is proposed that enables the anticipation of the future state expected to be reached by the robot. By incorporating the transformer structure, a new neural network structure is designed for multi-robot navigation that encodes the significance of nearby robots concerning their upcoming conditions. This network structure employs a BiGRUs to facilitate the assimilation of observations from multiple agents by the policy. Compared to existing works such as GA3C-CADRL, SARL, and RL-RVO, our proposed method achieves a higher success rate. In our future research, we will investigate methods to enhance the policy’s performance in guiding robots to complete tasks by focusing on improving travel time and average speed, while also strictly ensuring safe navigation. Furthermore, we plan to extend this approach by addressing navigation challenges in more densely populated multi-robot environments.