Abstract:We consider the problem of navigating a mobile robot towards a target in an unknown environment that is endowed with visual sensors, where neither the robot nor the sensors have access to global positioning information and only use first-person-view images. In order to overcome the need for positioning, we train the sensors to encode and communicate relevant viewpoint information to the mobile robot, whose objective it is to use this information to navigate to the target along the shortest path. We overcome the challenge of enabling all the sensors (even those that cannot directly see the target) to predict the direction along the shortest path to the target by implementing a neighborhood-based feature aggregation module using a Graph Neural Network (GNN) architecture. In our experiments, we first demonstrate generalizability to previously unseen environments with various sensor layouts. Our results show that by using communication between the sensors and the robot, we achieve up to 2.0x improvement in SPL (Success weighted by Path Length) when compared to a communication-free baseline. This is done without requiring a global map, positioning data, nor pre-calibration of the sensor network. Second, we perform a zero-shot transfer of our model from simulation to the real world. Laboratory experiments demonstrate the feasibility of our approach in various cluttered environments. Finally, we showcase examples of successful navigation to the target while both the sensor network layout as well as obstacles are dynamically reconfigured as the robot navigates. We provide a video demo, the dataset, trained models, and source code. <a class="link-external link-https" href="https://www.youtube.com/watch?v=kcmr6RUgucw" rel="external noopener nofollow">this https URL</a> <a class="link-external link-https" href="https://github.com/proroklab/sensor-guided-visual-nav" rel="external noopener nofollow">this https URL</a>

Monocular Camera-Based Point-Goal Navigation by Learning Depth Channel and Cross-Modality Pyramid Fusion

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Unifying Terrain Awareness Through Real-Time Semantic Segmentation

A Navigation Cognitive System Driven by Hierarchical Spiking Neural Network.

StereoNavNet: Learning to Navigate using Stereo Cameras with Auxiliary Occupancy Voxels

GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose

Image-Goal Navigation in Complex Environments via Modular Learning

Unsupervised Visual Odometry and Action Integration for PointGoal Navigation in Indoor Environment

3D-Aware Object Goal Navigation Via Simultaneous Exploration and Identification

BEVNav: Robot Autonomous Navigation Via Spatial-Temporal Contrastive Learning in Bird's-Eye View

Integrating Neural Radiance Fields End-to-End for Cognitive Visuomotor Navigation

Two-Stage Depth Enhanced Learning with Obstacle Map For Object Navigation

Collaborative Learning of Depth Estimation, Visual Odometry and Camera Relocalization from Monocular Videos.

Unifying Map and Landmark Based Representations for Visual Navigation

GA-Nav: Efficient Terrain Segmentation for Robot Navigation in Unstructured Outdoor Environments

Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill

On Deep Learning Techniques to Boost Monocular Depth Estimation for Autonomous Navigation

See What the Robot Can't See: Learning Cooperative Perception for Visual Navigation

Agent Journey Beyond RGB: Unveiling Hybrid Semantic-Spatial Environmental Representations for Vision-and-Language Navigation

Navi2Gaze: Leveraging Foundation Models for Navigation and Target Gazing

Embodied Question Answering in Photorealistic Environments With Point Cloud Perception