Value Explicit Pretraining for Learning Transferable Representations

Kiran Lekkala,Henghui Bao,Sumedh Sontakke,Laurent Itti
2024-03-07
Abstract:We propose Value Explicit Pretraining (VEP), a method that learns generalizable representations for transfer reinforcement learning. VEP enables learning of new tasks that share similar objectives as previously learned tasks, by learning an encoder for objective-conditioned representations, irrespective of appearance changes and environment dynamics. To pre-train the encoder from a sequence of observations, we use a self-supervised contrastive loss that results in learning temporally smooth representations. VEP learns to relate states across different tasks based on the Bellman return estimate that is reflective of task progress. Experiments using a realistic navigation simulator and Atari benchmark show that the pretrained encoder produced by our method outperforms current SoTA pretraining methods on the ability to generalize to unseen tasks. VEP achieves up to a 2 times improvement in rewards on Atari and visual navigation, and up to a 3 times improvement in sample efficiency. For videos of policy performance visit our <a class="link-external link-https" href="https://sites.google.com/view/value-explicit-pretraining/" rel="external noopener nofollow">this https URL</a>
Machine Learning,Robotics
What problem does this paper attempt to address?
The paper aims to address the problem of generalizable representation learning in visual sequential decision-making, particularly for transfer learning in reinforcement learning. Specifically, the study proposes the Value Explicit Pretraining (VEP) method, which is a technique for learning transferable representations that can enhance learning efficiency on new tasks similar to the pretraining task objectives. The core idea of VEP is to learn an encoder that captures the common objectives across different tasks, generating goal-conditioned representations rather than merely focusing on appearance changes or environmental dynamics. To achieve this, VEP utilizes self-supervised contrastive loss to pretrain the encoder, obtaining representations that change smoothly over time. VEP associates states of different tasks based on Bellman return estimates, reflecting task progress. Experimental results show that the VEP-pretrained encoder achieves up to 2 times and 3 times performance improvement on unseen tasks in Atari games and visual navigation tasks, respectively, compared to current state-of-the-art pretraining methods. Overall, the goal of this paper is to learn representations that are directly useful for downstream control tasks by leveraging reward labels in offline demonstration datasets, without the need for online training. The VEP method can improve the generality and sample efficiency of policy learning without altering the encoder.