Abstract:With the development of deep learning technology, deep reinforcement learning (DRL) has successfully built intelligent agents in sequential decision-making problems through interaction with image-based environments. However, learning from unlimited interaction is impractical and sample inefficient because training an agent requires many trial and error and numerous samples. One response to this problem is sample-efficient DRL, a research area that encourages learning effective state representations in limited interactions with image-based environments. Previous methods could effectively surpass human performance by training an RL agent using self-supervised learning and data augmentation to learn good state representations from a given interaction. However, most of the existing methods only consider similarity of image observations so that they are hard to capture semantic representations. To address these challenges, we propose spatio-temporal and action-based contrastive representation (STACoRe) learning for sample-efficient DRL. STACoRe performs two contrastive learning to learn proper state representations. One uses the agent's actions as pseudo labels, and the other uses spatio-temporal information. In particular, when performing the action-based contrastive learning, we propose a method that automatically selects data augmentation techniques suitable for each environment for stable model training. We train the model by simultaneously optimizing an action-based contrastive loss function and spatio-temporal contrastive loss functions in an end-to-end manner. This leads to improving sample efficiency for DRL. We use 26 benchmark games in Atari 2600 whose environment interaction is limited to only 100k steps. The experimental results confirm that our method is more sample efficient than existing methods. The code is available at https://github.com/dudwojae/STACoRe.

Sequential and Dynamic Constraint Contrastive Learning for Reinforcement Learning.

DCS: Debiased Contrastive Learning with Weak Supervision for Time Series Classification

Masked Contrastive Representation Learning for Reinforcement Learning

Integrating Contrastive Learning with Dynamic Models for Reinforcement Learning from Images

Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning

STACoRe: Spatio-temporal and Action-Based Contrastive Representations for Reinforcement Learning in Atari

Contrastive Example-Based Control

MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement Learning

TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning

Action-driven contrastive representation for reinforcement learning

Contrastive Difference Predictive Coding

Patch-Level Contrasting without Patch Correspondence for Accurate and Dense Contrastive Representation Learning

Contrastive Initial State Buffer for Reinforcement Learning

CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery

Contrastive Diffuser: Planning Towards High Return States via Contrastive Learning

Self-Damaging Contrastive Learning

Dynamics-Adaptive Continual Reinforcement Learning Via Progressive Contextualization.

Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning

Improving Context-Based Meta-Reinforcement Learning with Self-Supervised Trajectory Contrastive Learning

Learning Dense Reward with Temporal Variant Self-Supervision

Contrastive Continual Learning with Importance Sampling and Prototype-Instance Relation Distillation