Zero-Shot Reinforcement Learning via Function Encoders

Tyler Ingebrand,Amy Zhang,Ufuk Topcu
2024-05-11
Abstract:Although reinforcement learning (RL) can solve many challenging sequential decision making problems, achieving zero-shot transfer across related tasks remains a challenge. The difficulty lies in finding a good representation for the current task so that the agent understands how it relates to previously seen tasks. To achieve zero-shot transfer, we introduce the function encoder, a representation learning algorithm which represents a function as a weighted combination of learned, non-linear basis functions. By using a function encoder to represent the reward function or the transition function, the agent has information on how the current task relates to previously seen tasks via a coherent vector representation. Thus, the agent is able to achieve transfer between related tasks at run time with no additional training. We demonstrate state-of-the-art data efficiency, asymptotic performance, and training stability in three RL fields by augmenting basic RL algorithms with a function encoder task representation.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the problem of zero-shot transfer in Reinforcement Learning (RL). Specifically, the goal of the paper is to enable RL algorithms to solve any given task without additional training when faced with a series of related tasks. This objective is particularly important in real-world applications, such as autonomous robots needing to perform various cooking and cleaning tasks in a kitchen environment, or dealing with different sliding conditions during outdoor operations in winter. To achieve this goal, the authors propose a new representation learning algorithm called the function encoder. This algorithm represents tasks as a weighted combination of a set of nonlinear basis functions, allowing the current task to be coherently represented as a vector that relates to previously encountered tasks. In this way, RL algorithms can use the function encoder to represent reward functions or transition functions, thereby achieving zero-shot transfer across related tasks. Experimental results show that in multiple RL domains, basic RL algorithms combined with the function encoder demonstrate state-of-the-art data efficiency, asymptotic performance, and training stability.