Abstract:In pixel-based deep reinforcement learning (DRL), learning representations of states that change because of an agent's action or interaction with the environment poses a critical challenge in improving data efficiency. Recent data-efficient DRL studies have integrated DRL with self-supervised learning (SSL) and data augmentation to learn state representations from given interactions. However, some methods have difficulties in explicitly capturing evolving state representations or in selecting data augmentations for appropriate reward signals. Our goal is to explicitly learn the inherent dynamics that change with an agent's intervention and interaction with the environment. We propose masked and inverse dynamics modeling (MIND), which uses masking augmentation and fewer hyperparameters to learn agent-controllable representations in changing states. Our method is comprised of a self-supervised multitask learning that leverages a transformer architecture, which captures the spatiotemporal information underlying in the highly correlated consecutive frames. MIND uses two tasks to perform self-supervised multitask learning: masked modeling and inverse dynamics modeling. Masked modeling learns the static visual representation required for control in the state, and inverse dynamics modeling learns the rapidly evolving state representation with agent intervention. By integrating inverse dynamics modeling as a complementary component to masked modeling, our method effectively learns evolving state representations. We evaluate our method by using discrete and continuous control environments with limited interactions. MIND outperforms previous methods across benchmarks and significantly improves data efficiency. The code is available at https://github.com/dudwojae/MIND.

Mask-based Latent Reconstruction for Reinforcement Learning

Mask-based Latent Reconstruction for Reinforcement Learning

Sample-efficient multi-agent reinforcement learning with masked reconstruction

Masked contrastive representation learning for reinforcement learning

Masked and Inverse Dynamics Modeling for Data-Efficient Reinforcement Learning

Learning Latent Dynamic Robust Representations for World Models

Unsupervised Representation Learning of Player Behavioral Data with Confidence Guided Masking

Lifelong Reinforcement Learning with Modulating Masks

RePreM: Representation Pre-training with Masked Model for Reinforcement Learning

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model

Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning

MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement Learning

Masked Autoencoding for Scalable and Generalizable Decision Making

Manifold-Based Reinforcement Learning via Locally Linear Reconstruction.

Learning to Identify Critical States for Reinforcement Learning from Videos

Masked World Models for Visual Control

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning

PointPatchRL -- Masked Reconstruction Improves Reinforcement Learning on Point Clouds

DMCL: Robot Autonomous Navigation Via Depth Image Masked Contrastive Learning

Why the Agent Made that Decision: Explaining Deep Reinforcement Learning with Vision Masks