Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings

Kevin Frans,Seohong Park,Pieter Abbeel,Sergey Levine
2024-02-27
Abstract:Can we pre-train a generalist agent from a large amount of unlabeled offline trajectories such that it can be immediately adapted to any new downstream tasks in a zero-shot manner? In this work, we present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem. Our main idea is to learn functional representations of any arbitrary tasks by encoding their state-reward samples using a transformer-based variational auto-encoder. This functional encoding not only enables the pre-training of an agent from a wide diversity of general unsupervised reward functions, but also provides a way to solve any new downstream tasks in a zero-shot manner, given a small number of reward-annotated samples. We empirically show that FRE agents trained on diverse random unsupervised reward functions can generalize to solve novel tasks in a range of simulated robotic benchmarks, often outperforming previous zero-shot RL and offline RL methods. Code for this project is provided at:
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the **Zero-Shot Reinforcement Learning (ZS-RL) problem**. Specifically, the authors seek to pre-train a general agent from a large amount of unlabeled offline trajectories, enabling the agent to immediately adapt to any new downstream tasks without further training or fine-tuning. ### Background and Motivation In practical applications, it is highly valuable to build an agent that performs well across various tasks. For example, household robots could complete more chores, and autonomous vehicles could reach more destinations. Inspired by the recent success of unsupervised learning in language and vision domains, the authors attempt to achieve similar goals in the field of reinforcement learning. By training a general model from large-scale unlabeled data, it can immediately solve various tasks without additional training or fine-tuning. ### Main Methods To achieve this goal, the authors propose a method called **Functional Reward Encoding (FRE)**. The core idea of FRE is to learn functional representations of arbitrary tasks by encoding state-reward pairs. The specific steps are as follows: 1. **Functional Reward Encoding (FRE)**: - Use a Transformer-based Variational Autoencoder (VAE) to encode state-reward pairs, thereby learning a functional representation in a latent space. - This latent space can represent arbitrary reward functions and quickly identify new task representations with a small number of reward-labeled samples. 2. **Pre-training**: - Pre-train a multi-task agent from a large amount of unlabeled offline trajectories, which contain various random unsupervised reward functions. - During pre-training, the agent learns how to maximize these unsupervised reward functions. 3. **Zero-Shot Adaptation**: - At test time, by encoding a small number of reward-labeled samples of the new task into the latent space, the agent can immediately adapt to the new task without further training. ### Experimental Results The authors conducted experiments on multiple standard offline reinforcement learning benchmarks, including AntMaze, ExORL, and Kitchen environments. The experimental results show that the FRE method performs excellently across various tasks, especially in goal-reaching tasks, where the FRE agent significantly outperforms existing zero-shot reinforcement learning methods. ### Conclusion This paper proposes a simple and scalable method to achieve the capability of pre-training a general agent from unlabeled offline data through Functional Reward Encoding (FRE), enabling it to solve new downstream tasks in a zero-shot manner. The experimental results validate the effectiveness of this method, particularly in goal-reaching tasks.