Abstract:This article investigates the multirobot cooperative navigation problem based on raw visual observations. A fully end-to-end learning framework is presented, which leverages graph neural networks to learn local motion coordination and utilizes deep reinforcement learning to generate visuomotor policy that enables each robot to move to its goal without the need of environment map and global positioning information. Experimental results show that, with a few tens of robots, our approach achieves comparable performance with the state-of-the-art imitation learning-based approaches with bird-view state inputs. We also illustrate our generalizability to crowded and large environments and our scalability to ten times number of the training robots. In addition, we demonstrate that our model trained for multirobot case can also improve the success rate in the single-robot navigation task in unseen environments. Note to Practitioners—With the development of intelligent industrial and logistic systems, robotic transportation systems are widely implemented. However, existing multirobot path coordination and navigation approaches are basically under some unreasonable assumptions, which are very hard to be implemented in practical scenarios. This article aims to greatly promote the real application of learning-based multirobot cooperative navigation approach, in order to achieve the following. First, we introduce an end-to-end reinforcement learning framework instead of the commonly used imitation learning strategy, as the latter one needs exhaustive training data to cover all the scenarios and does not have the required generalizability. Second, we directly use the raw sensor data instead of the commonly used bird-eye-view semantic observations, as the latter one is generally not representative of practical application scenario from the robot perspective and cannot solve the occlusion issue. Third, we interpret our learned model to illustrate which parts of t-e input and shared observations contribute most to the robots' final actions. The above interpretability ensures predictability (thus safety) of our visuomotor policy in practical applications. Our learned visuomotor policy has the ability to coordinate dozens of robots by only using raw visual observations in unknown environments without map nor global localization information, this is the first time in the literature. Our future work includes solving the sim-to-real issue and conducting physical experiments.

Multi-Agent Embodied Visual Semantic Navigation with Scene Prior Knowledge

ChatNav: Leveraging LLM to Zero-shot Semantic Reasoning in Object Navigation

Visual Semantic Navigation using Scene Priors

Collaborative Visual Navigation

Learning Navigational Visual Representations with Semantic Map Supervision

Learning Efficient Multi-Agent Cooperative Visual Exploration

Multi-Object Navigation Using Potential Target Position Policy Function

Agent Journey Beyond RGB: Unveiling Hybrid Semantic-Spatial Environmental Representations for Vision-and-Language Navigation

Multi goals and multi scenes visual mapless navigation in indoor using meta-learning and scene priors

CAMON: Cooperative Agents for Multi-Object Navigation with LLM-based Conversations

Co-NavGPT: Multi-Robot Cooperative Visual Semantic Navigation using Large Language Models

Vision-and-Language Navigation via Latent Semantic Alignment Learning

Learning Autonomous Exploration and Mapping with Semantic Vision

Visuomotor Reinforcement Learning for Multirobot Cooperative Navigation

A Hybrid Approach to Real-Time Robotic Visual Navigation: Integrating Detection and Scene Segmentation

Interactive Semantic Map Representation for Skill-Based Visual Object Navigation

Visual Representations for Semantic Target Driven Navigation

Multi-Object Navigation in real environments using hybrid policies

Learning a Semantic Prior for Guided Navigation

Teaching Agents how to Map: Spatial Reasoning for Multi-Object Navigation