Abstract:World models play a crucial role in decision-making within embodied environments, enabling cost-free explorations that would otherwise be expensive in the real world. To facilitate effective decision-making, world models must be equipped with strong generalizability to support faithful imagination in out-of-distribution (OOD) regions and provide reliable uncertainty estimation to assess the credibility of the simulated experiences, both of which present significant challenges for prior scalable approaches. This paper introduces WHALE, a framework for learning generalizable world models, consisting of two key techniques: behavior-conditioning and retracing-rollout. Behavior-conditioning addresses the policy distribution shift, one of the primary sources of the world model generalization error, while retracing-rollout enables efficient uncertainty estimation without the necessity of model ensembles. These techniques are universal and can be combined with any neural network architecture for world model learning. Incorporating these two techniques, we present Whale-ST, a scalable spatial-temporal transformer-based world model with enhanced generalizability. We demonstrate the superiority of Whale-ST in simulation tasks by evaluating both value estimation accuracy and video generation fidelity. Additionally, we examine the effectiveness of our uncertainty estimation technique, which enhances model-based policy optimization in fully offline scenarios. Furthermore, we propose Whale-X, a 414M parameter world model trained on 970K trajectories from Open X-Embodiment datasets. We show that Whale-X exhibits promising scalability and strong generalizability in real-world manipulation scenarios using minimal demonstrations.

Towards Unraveling and Improving Generalization in World Models

ED2: an Environment Dynamics Decomposition Framework for World Model Construction

Learning Latent Dynamic Robust Representations for World Models

WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making

Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning

Learning World Models for Unconstrained Goal Navigation

Deep Neuroevolution of Recurrent and Discrete World Models

Quantifying Multimodality in World Models

Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization

Understanding What Affects the Generalization Gap in Visual Reinforcement Learning: Theory and Empirical Evidence

Language Models Meet World Models: Embodied Experiences Enhance Language Models

Diffusion World Model: Future Modeling Beyond Step-by-Step Rollout for Offline Reinforcement Learning

Evaluating World Models with LLM for Decision Making

ED2: Environment Dynamics Decomposition World Models for Continuous Control

Focus On What Matters: Separated Models For Visual-Based RL Generalization

Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models

A Relational Intervention Approach for Unsupervised Dynamics Generalization in Model-Based Reinforcement Learning

Dropout's Dream Land: Generalization from Learned Simulators to Reality

Towards Understanding How to Reduce Generalization Gap in Visual Reinforcement Learning.