Learning Latent Dynamic Robust Representations for World Models

Ruixiang Sun,Hongyu Zang,Xin Li,Riashat Islam

2024-05-30

Abstract:Visual Model-Based Reinforcement Learning (MBRL) promises to encapsulate agent's knowledge about the underlying dynamics of the environment, enabling learning a world model as a useful planner. However, top MBRL agents such as Dreamer often struggle with visual pixel-based inputs in the presence of exogenous or irrelevant noise in the observation space, due to failure to capture task-specific features while filtering out irrelevant spatio-temporal details. To tackle this problem, we apply a spatio-temporal masking strategy, a bisimulation principle, combined with latent reconstruction, to capture endogenous task-specific aspects of the environment for world models, effectively eliminating non-essential information. Joint training of representations, dynamics, and policy often leads to instabilities. To further address this issue, we develop a Hybrid Recurrent State-Space Model (HRSSM) structure, enhancing state representation robustness for effective policy learning. Our empirical evaluation demonstrates significant performance improvements over existing methods in a range of visually complex control tasks such as Maniskill \cite{gu2023maniskill2} with exogenous distractors from the Matterport environment. Our code is avaliable at <a class="link-external link-https" href="https://github.com/bit1029public/HRSSM" rel="external noopener nofollow">this https URL</a>.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

The paper aims to address the issue of poor performance of visual model-based reinforcement learning (MBRL) in environments with extraneous noise or irrelevant information. Specifically, existing MBRL methods such as Dreamer suffer from performance degradation when handling visual pixel inputs because they fail to effectively capture task-relevant features and filter out irrelevant spatial and temporal details. To solve this problem, the authors propose a method that combines a spatio-temporal masking strategy, dual simulation principles, and latent reconstruction to capture endogenous task-relevant aspects of the environment and effectively eliminate unnecessary information. Additionally, to further address potential instability issues in the joint training of representation, dynamics, and policy, the authors develop a Hybrid Recurrent State-Space Model (HRSSM) structure to enhance the robustness of state representation, thereby enabling effective policy learning. Experimental results demonstrate that this method significantly improves performance in a range of complex control tasks, particularly in visual environments with extraneous disturbances (e.g., Maniskill), compared to existing methods.

Learning Latent Dynamic Robust Representations for World Models

DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations

Masked World Models for Visual Control

ED2: an Environment Dynamics Decomposition Framework for World Model Construction

Learning a World Model With Multitimescale Memory Augmentation

Predictive Experience Replay for Continual Visual Control and Forecasting

HarmonyDream: Task Harmonization Inside World Models

Mask-based Latent Reconstruction for Reinforcement Learning

Harmony World Models: Boosting Sample Efficiency for Model-based Reinforcement Learning

Masked and Inverse Dynamics Modeling for Data-Efficient Reinforcement Learning

Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning

MuDreamer: Learning Predictive World Models without Reconstruction

Policy-shaped prediction: avoiding distractions in model-based reinforcement learning

ED2: Environment Dynamics Decomposition World Models for Continuous Control

SafeDreamer: Safe Reinforcement Learning with World Models

ReCoRe: Regularized Contrastive Representation Learning of World Model

Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization

RePo: Resilient Model-Based Reinforcement Learning by Regularizing Posterior Predictability

Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction

Mastering Memory Tasks with World Models