X-MOBILITY: End-To-End Generalizable Navigation via World Modeling

Wei Liu,Huihua Zhao,Chenran Li,Joydeep Biswas,Billy Okal,Pulkit Goyal,Yan Chang,Soha Pouya
2024-10-23
Abstract:General-purpose navigation in challenging environments remains a significant problem in robotics, with current state-of-the-art approaches facing myriad limitations. Classical approaches struggle with cluttered settings and require extensive tuning, while learning-based methods face difficulties generalizing to out-of-distribution environments. This paper introduces X-Mobility, an end-to-end generalizable navigation model that overcomes existing challenges by leveraging three key ideas. First, X-Mobility employs an auto-regressive world modeling architecture with a latent state space to capture world dynamics. Second, a diverse set of multi-head decoders enables the model to learn a rich state representation that correlates strongly with effective navigation skills. Third, by decoupling world modeling from action policy, our architecture can train effectively on a variety of data sources, both with and without expert policies: off-policy data allows the model to learn world dynamics, while on-policy data with supervisory control enables optimal action policy learning. Through extensive experiments, we demonstrate that X-Mobility not only generalizes effectively but also surpasses current state-of-the-art navigation approaches. Additionally, X-Mobility also achieves zero-shot Sim2Real transferability and shows strong potential for cross-embodiment generalization.
Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve general - purpose robot navigation in complex environments. The current state - of - the - art methods have several limitations. Classical methods perform poorly in cluttered environments and require a large amount of parameter tuning, while learning - based methods are difficult to generalize to out - of - distribution environments. The paper introduces a new end - to - end general - purpose navigation model **X - MOBILITY**, which overcomes existing challenges through the following three key points: 1. **Autoregressive World Modeling Architecture**: X - MOBILITY uses an autoregressive world - modeling architecture with a latent state space to capture world dynamics. 2. **Multi - Head Decoder**: Through a set of diverse multi - head decoders, the model can learn rich state representations that are highly relevant to effective navigation skills. 3. **Decoupling World Modeling and Action Policy**: By decoupling world modeling and action policy, the architecture of X - MOBILITY can be effectively trained from various data sources, including supervised and unsupervised data. This enables the model to learn world dynamics from offline data and the optimal action policy from online data. Through extensive experiments, the researchers have demonstrated that X - MOBILITY can not only generalize effectively but also outperform the current state - of - the - art navigation methods on multiple metrics. In addition, X - MOBILITY has also achieved zero - shot simulation - to - reality transfer and demonstrated strong generalization potential across different robot morphologies.