X-MOBILITY: End-To-End Generalizable Navigation via World Modeling

Wei Liu,Huihua Zhao,Chenran Li,Joydeep Biswas,Billy Okal,Pulkit Goyal,Yan Chang,Soha Pouya

2024-10-23

Abstract:General-purpose navigation in challenging environments remains a significant problem in robotics, with current state-of-the-art approaches facing myriad limitations. Classical approaches struggle with cluttered settings and require extensive tuning, while learning-based methods face difficulties generalizing to out-of-distribution environments. This paper introduces X-Mobility, an end-to-end generalizable navigation model that overcomes existing challenges by leveraging three key ideas. First, X-Mobility employs an auto-regressive world modeling architecture with a latent state space to capture world dynamics. Second, a diverse set of multi-head decoders enables the model to learn a rich state representation that correlates strongly with effective navigation skills. Third, by decoupling world modeling from action policy, our architecture can train effectively on a variety of data sources, both with and without expert policies: off-policy data allows the model to learn world dynamics, while on-policy data with supervisory control enables optimal action policy learning. Through extensive experiments, we demonstrate that X-Mobility not only generalizes effectively but also surpasses current state-of-the-art navigation approaches. Additionally, X-Mobility also achieves zero-shot Sim2Real transferability and shows strong potential for cross-embodiment generalization.

Robotics

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to achieve general - purpose robot navigation in complex environments. The current state - of - the - art methods have several limitations. Classical methods perform poorly in cluttered environments and require a large amount of parameter tuning, while learning - based methods are difficult to generalize to out - of - distribution environments. The paper introduces a new end - to - end general - purpose navigation model **X - MOBILITY**, which overcomes existing challenges through the following three key points: 1. **Autoregressive World Modeling Architecture**: X - MOBILITY uses an autoregressive world - modeling architecture with a latent state space to capture world dynamics. 2. **Multi - Head Decoder**: Through a set of diverse multi - head decoders, the model can learn rich state representations that are highly relevant to effective navigation skills. 3. **Decoupling World Modeling and Action Policy**: By decoupling world modeling and action policy, the architecture of X - MOBILITY can be effectively trained from various data sources, including supervised and unsupervised data. This enables the model to learn world dynamics from offline data and the optimal action policy from online data. Through extensive experiments, the researchers have demonstrated that X - MOBILITY can not only generalize effectively but also outperform the current state - of - the - art navigation methods on multiple metrics. In addition, X - MOBILITY has also achieved zero - shot simulation - to - reality transfer and demonstrated strong generalization potential across different robot morphologies.

X-MOBILITY: End-To-End Generalizable Navigation via World Modeling

CrowdMove: Autonomous Mapless Navigation in Crowded Scenarios

A LiDAR Based End to End Controller for Robot Navigation Using Deep Neural Network

Exploitation-Guided Exploration for Semantic Embodied Navigation

Learning World Transition Model for Socially Aware Robot Navigation

Navigation World Models

Robust Navigation with Cross-Modal Fusion and Knowledge Transfer

NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration

Multi-Object Navigation in real environments using hybrid policies

CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos

Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation

Learning with a Mole: Transferable latent spatial representations for navigation without reconstruction

Image-Goal Navigation in Complex Environments via Modular Learning

Robot Navigation with Map-Based Deep Reinforcement Learning

Towards Learning a Generalist Model for Embodied Navigation

Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs

CrowdGAIL: A spatiotemporal aware method for agent navigation

Context vector-based visual mapless navigation in indoor using hierarchical semantic information and meta-learning

Deep reinforcement learning-aided autonomous navigation with landmark generators

Learning to Navigate for Mobile Robot with Continual Reinforcement Learning

Multigoal Visual Navigation With Collision Avoidance via Deep Reinforcement Learning