Abstract:In pixel-based deep reinforcement learning (DRL), learning representations of states that change because of an agent's action or interaction with the environment poses a critical challenge in improving data efficiency. Recent data-efficient DRL studies have integrated DRL with self-supervised learning (SSL) and data augmentation to learn state representations from given interactions. However, some methods have difficulties in explicitly capturing evolving state representations or in selecting data augmentations for appropriate reward signals. Our goal is to explicitly learn the inherent dynamics that change with an agent's intervention and interaction with the environment. We propose masked and inverse dynamics modeling (MIND), which uses masking augmentation and fewer hyperparameters to learn agent-controllable representations in changing states. Our method is comprised of a self-supervised multitask learning that leverages a transformer architecture, which captures the spatiotemporal information underlying in the highly correlated consecutive frames. MIND uses two tasks to perform self-supervised multitask learning: masked modeling and inverse dynamics modeling. Masked modeling learns the static visual representation required for control in the state, and inverse dynamics modeling learns the rapidly evolving state representation with agent intervention. By integrating inverse dynamics modeling as a complementary component to masked modeling, our method effectively learns evolving state representations. We evaluate our method by using discrete and continuous control environments with limited interactions. MIND outperforms previous methods across benchmarks and significantly improves data efficiency. The code is available at https://github.com/dudwojae/MIND.

Pretraining Representations for Data-Efficient Reinforcement Learning

Content Classification Tasks with Data Preprocessing Manifestations

Pre-training with Non-expert Human Demonstration for Deep Reinforcement Learning

Jointly Pre-training with Supervised, Autoencoder, and Value Losses for Deep Reinforcement Learning

Become a Proficient Player with Limited Data through Watching Pure Videos

Pre-training Neural Networks with Human Demonstrations for Deep Reinforcement Learning

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

A Data-Efficient Training Method for Deep Reinforcement Learning

Accelerating exploration and representation learning with offline pre-training

Reinforcement Learning with Unsupervised Auxiliary Tasks

Pretraining & Reinforcement Learning: Sharpening the Axe Before Cutting the Tree

Learning Visual Robotic Control Efficiently with Contrastive Pre-training and Data Augmentation

Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

A Data-efficiency Training Framework for Deep Reinforcement Learning

Efficient Online Reinforcement Learning with Offline Data

Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments

The Role of Pretrained Representations for the OOD Generalization of Reinforcement Learning Agents

Mastering Atari Games with Limited Data

Masked and Inverse Dynamics Modeling for Data-Efficient Reinforcement Learning

Task-Induced Representation Learning

Behavior From the Void: Unsupervised Active Pre-Training