Value Explicit Pretraining for Learning Transferable Representations

Kiran Lekkala,Henghui Bao,Sumedh Sontakke,Laurent Itti

2024-03-07

Abstract:We propose Value Explicit Pretraining (VEP), a method that learns generalizable representations for transfer reinforcement learning. VEP enables learning of new tasks that share similar objectives as previously learned tasks, by learning an encoder for objective-conditioned representations, irrespective of appearance changes and environment dynamics. To pre-train the encoder from a sequence of observations, we use a self-supervised contrastive loss that results in learning temporally smooth representations. VEP learns to relate states across different tasks based on the Bellman return estimate that is reflective of task progress. Experiments using a realistic navigation simulator and Atari benchmark show that the pretrained encoder produced by our method outperforms current SoTA pretraining methods on the ability to generalize to unseen tasks. VEP achieves up to a 2 times improvement in rewards on Atari and visual navigation, and up to a 3 times improvement in sample efficiency. For videos of policy performance visit our <a class="link-external link-https" href="https://sites.google.com/view/value-explicit-pretraining/" rel="external noopener nofollow">this https URL</a>

Machine Learning,Robotics

What problem does this paper attempt to address?

The paper aims to address the problem of generalizable representation learning in visual sequential decision-making, particularly for transfer learning in reinforcement learning. Specifically, the study proposes the Value Explicit Pretraining (VEP) method, which is a technique for learning transferable representations that can enhance learning efficiency on new tasks similar to the pretraining task objectives. The core idea of VEP is to learn an encoder that captures the common objectives across different tasks, generating goal-conditioned representations rather than merely focusing on appearance changes or environmental dynamics. To achieve this, VEP utilizes self-supervised contrastive loss to pretrain the encoder, obtaining representations that change smoothly over time. VEP associates states of different tasks based on Bellman return estimates, reflecting task progress. Experimental results show that the VEP-pretrained encoder achieves up to 2 times and 3 times performance improvement on unseen tasks in Atari games and visual navigation tasks, respectively, compared to current state-of-the-art pretraining methods. Overall, the goal of this paper is to learn representations that are directly useful for downstream control tasks by leveraging reward labels in offline demonstration datasets, without the need for online training. The VEP method can improve the generality and sample efficiency of policy learning without altering the encoder.

Value Explicit Pretraining for Learning Transferable Representations

Pre-trained Visual Dynamics Representations for Efficient Policy Learning

Visual Pretraining via Contrastive Predictive Model for Pixel-Based Reinforcement Learning

Robotic Offline RL from Internet Videos via Value-Function Pre-Training

What do we learn from a large-scale study of pre-trained visual representations in sim and real environments?

Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-Training

Video Pre-trained Transformer: A Multimodal Mixture of Pre-trained Experts

BEVT: BERT Pretraining of Video Transformers

Value-Penalized Auxiliary Control from Examples for Learning without Rewards or Demonstrations

Real-World Robot Learning with Masked Visual Pre-training

SimVTP: Simple Video Text Pre-training with Masked Autoencoders

Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning

Efficient Transfer Learning for Video-language Foundation Models

VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning

Transferable Post-training via Inverse Value Learning

Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks

Learning Transferable Spatiotemporal Representations from Natural Script Knowledge.

Learning from Visual Observation via Offline Pretrained State-to-Go Transformer

Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

Evaluating Protein Transfer Learning with TAPE

VANP: Learning Where to See for Navigation with Self-Supervised Vision-Action Pre-Training