Combining Behaviors with the Successor Features Keyboard

Wilka Carvalho,Andre Saraiva,Angelos Filos,Andrew Kyle Lampinen,Loic Matthey,Richard L. Lewis,Honglak Lee,Satinder Singh,Danilo J. Rezende,Daniel Zoran
DOI: https://doi.org/10.48550/arXiv.2310.15940
2023-10-24
Abstract:The Option Keyboard (OK) was recently proposed as a method for transferring behavioral knowledge across tasks. OK transfers knowledge by adaptively combining subsets of known behaviors using Successor Features (SFs) and Generalized Policy Improvement (GPI). However, it relies on hand-designed state-features and task encodings which are cumbersome to design for every new environment. In this work, we propose the "Successor Features Keyboard" (SFK), which enables transfer with discovered state-features and task encodings. To enable discovery, we propose the "Categorical Successor Feature Approximator" (CSFA), a novel learning algorithm for estimating SFs while jointly discovering state-features and task encodings. With SFK and CSFA, we achieve the first demonstration of transfer with SFs in a challenging 3D environment where all the necessary representations are discovered. We first compare CSFA against other methods for approximating SFs and show that only CSFA discovers representations compatible with SF&GPI at this scale. We then compare SFK against transfer learning baselines and show that it transfers most quickly to long-horizon tasks.
Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper aims to solve the problem of how to effectively transfer behavioral knowledge in deep reinforcement learning, especially how to use known behaviors to quickly adapt to new tasks when facing new, long - cycle tasks. Specifically, the paper proposes two novel methods: **Successor Features Keyboard (SFK)** and **Categorical Successor Feature Approximator (CSFA)**. #### Background problems - **Limitations of existing methods**: Existing methods such as Option Keyboard (OK) can adapt to new tasks by combining known behaviors, but rely on hand - designed state features and task encodings, which are difficult to scale in new environments. - **Dynamic query and discovery representation**: Ideally, an effective transfer learning method should be able to use dynamic queries and be able to automatically discover state features and task preferences, and share these representations among multiple tasks. #### Specific problems - **Dynamic query**: How to achieve dynamic query to adapt to new tasks without relying on hand - designed methods? - **Discovery representation**: How to automatically discover state features and task preferences in complex environments? - **Large - scale multi - task settings**: How to effectively share task encoders and successor feature estimators in large - scale multi - task settings to achieve better knowledge transfer? ### Solutions - **Successor Features Keyboard (SFK)**: By dynamically selecting linear combinations of known task behaviors, it can quickly adapt to new tasks. - **Categorical Successor Feature Approximator (CSFA)**: By introducing a new learning algorithm, it can jointly learn successor features, state features, and task preferences, thereby achieving effective transfer learning in complex environments. ### Experimental verification - **Environment**: Experiments are carried out in the complex 3D environment Playroom, which has long - cycle tasks with high - dimensional pixel observations and sparse rewards. - **Baseline methods**: It is compared with a variety of baseline methods, including Universal Successor Feature Approximators (USFA) and Modular Successor Feature Approximators (MSFA), as well as multi - task reinforcement learning (MTRL) and Distral methods. - **Results**: The experimental results show that SFK has better transfer performance in long - cycle tasks than other methods and can learn new tasks with fewer samples. ### Summary By proposing SFK and CSFA, this paper solves the problem of how to effectively transfer behavioral knowledge in complex environments, especially how to achieve dynamic query and automatic discovery representation without relying on hand - designed methods, so as to achieve rapid adaptation in long - cycle tasks.