Abstract:Recent advancements in Model-Based Reinforcement Learning (MBRL) have made it a powerful tool for visual control tasks. Despite improved data efficiency, it remains challenging to train MBRL agents with generalizable perception. Training in the presence of visual distractions is particularly difficult due to the high variation they introduce to representation learning. Building on DREAMER, a popular MBRL method, we propose a simple yet effective auxiliary task to facilitate representation learning in distracting environments. Under the assumption that task-relevant components of image observations are straightforward to identify with prior knowledge in a given task, we use a segmentation mask on image observations to only reconstruct task-relevant components. In doing so, we greatly reduce the complexity of representation learning by removing the need to encode task-irrelevant objects in the latent representation. Our method, Segmentation Dreamer (SD), can be used either with ground-truth masks easily accessible in simulation or by leveraging potentially imperfect segmentation foundation models. The latter is further improved by selectively applying the reconstruction loss to avoid providing misleading learning signals due to mask prediction errors. In modified DeepMind Control suite (DMC) and Meta-World tasks with added visual distractions, SD achieves significantly better sample efficiency and greater final performance than prior work. We find that SD is especially helpful in sparse reward tasks otherwise unsolvable by prior work, enabling the training of visually robust agents without the need for extensive reward engineering.

Policy-shaped prediction: avoiding distractions in model-based reinforcement learning

DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations

Make the Pertinent Salient: Task-Relevant Reconstruction for Visual Control with Distractions

Plan to Predict: Learning an Uncertainty-Foreseeing Model for Model-Based Reinforcement Learning.

Learning Latent Dynamic Robust Representations for World Models

MuDreamer: Learning Predictive World Models without Reconstruction

Look Before You Leap: Safe Model-Based Reinforcement Learning with Human Intervention

Model-Based Off-Policy Deep Reinforcement Learning with Model-Embedding

Dream to Drive With Predictive Individual World Model

Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization

Model-Based Reinforcement Learning via Meta-Policy Optimization

Safe Exploration Using Bayesian World Models and Log-Barrier Optimization

Dreaming: Model-based Reinforcement Learning by Latent Imagination without Reconstruction

Model-Based Reinforcement Learning Via Imagination with Derived Memory.

TransDreamer: Reinforcement Learning with Transformer World Models

RePo: Resilient Model-Based Reinforcement Learning by Regularizing Posterior Predictability

DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing

SMiRL: Surprise Minimizing Reinforcement Learning in Unstable Environments

Model-Based Reinforcement Learning with Isolated Imaginations

Learning a World Model With Multitimescale Memory Augmentation

Dynamic Model Predictive Shielding for Provably Safe Reinforcement Learning