Abstract:Can we pre-train a generalist agent from a large amount of unlabeled offline trajectories such that it can be immediately adapted to any new downstream tasks in a zero-shot manner? In this work, we present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem. Our main idea is to learn functional representations of any arbitrary tasks by encoding their state-reward samples using a transformer-based variational auto-encoder. This functional encoding not only enables the pre-training of an agent from a wide diversity of general unsupervised reward functions, but also provides a way to solve any new downstream tasks in a zero-shot manner, given a small number of reward-annotated samples. We empirically show that FRE agents trained on diverse random unsupervised reward functions can generalize to solve novel tasks in a range of simulated robotic benchmarks, often outperforming previous zero-shot RL and offline RL methods. Code for this project is provided at:

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the **Zero-Shot Reinforcement Learning (ZS-RL) problem**. Specifically, the authors seek to pre-train a general agent from a large amount of unlabeled offline trajectories, enabling the agent to immediately adapt to any new downstream tasks without further training or fine-tuning. ### Background and Motivation In practical applications, it is highly valuable to build an agent that performs well across various tasks. For example, household robots could complete more chores, and autonomous vehicles could reach more destinations. Inspired by the recent success of unsupervised learning in language and vision domains, the authors attempt to achieve similar goals in the field of reinforcement learning. By training a general model from large-scale unlabeled data, it can immediately solve various tasks without additional training or fine-tuning. ### Main Methods To achieve this goal, the authors propose a method called **Functional Reward Encoding (FRE)**. The core idea of FRE is to learn functional representations of arbitrary tasks by encoding state-reward pairs. The specific steps are as follows: 1. **Functional Reward Encoding (FRE)**: - Use a Transformer-based Variational Autoencoder (VAE) to encode state-reward pairs, thereby learning a functional representation in a latent space. - This latent space can represent arbitrary reward functions and quickly identify new task representations with a small number of reward-labeled samples. 2. **Pre-training**: - Pre-train a multi-task agent from a large amount of unlabeled offline trajectories, which contain various random unsupervised reward functions. - During pre-training, the agent learns how to maximize these unsupervised reward functions. 3. **Zero-Shot Adaptation**: - At test time, by encoding a small number of reward-labeled samples of the new task into the latent space, the agent can immediately adapt to the new task without further training. ### Experimental Results The authors conducted experiments on multiple standard offline reinforcement learning benchmarks, including AntMaze, ExORL, and Kitchen environments. The experimental results show that the FRE method performs excellently across various tasks, especially in goal-reaching tasks, where the FRE agent significantly outperforms existing zero-shot reinforcement learning methods. ### Conclusion This paper proposes a simple and scalable method to achieve the capability of pre-training a general agent from unlabeled offline data through Functional Reward Encoding (FRE), enabling it to solve new downstream tasks in a zero-shot manner. The experimental results validate the effectiveness of this method, particularly in goal-reaching tasks.

Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings

Zero-Shot Reinforcement Learning via Function Encoders

Accelerating Exploration with Unlabeled Prior Data

Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

Semi-supervised reward learning for offline reinforcement learning

Uncertainty-Aware Reward-Free Exploration with General Function Approximation

End-to-End Robotic Reinforcement Learning without Reward Engineering

An Ensemble Fuzzy Approach for Inverse Reinforcement Learning

Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement Learning

Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human Videos

Self-Supervised Reinforcement Learning that Transfers using Random Features

Unsupervised Behavior Extraction via Random Intent Priors

Zero-shot Policy Learning with Spatial Temporal RewardDecomposition on Contingency-aware Observation

Zero-shot Policy Learning with Spatial Temporal Reward Decomposition on Contingency-aware Observation.

PROGRESSOR: A Perceptually Guided Reward Estimator with Self-Supervised Online Refinement

RL Zero: Zero-Shot Language to Behaviors without any Supervision

Unsupervised Control Through Non-Parametric Discriminative Rewards

Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Become a Proficient Player with Limited Data through Watching Pure Videos

Sequence Prediction with Unlabeled Data by Reward Function Learning

Towards model-free RL algorithms that scale well with unstructured data