Abstract:The Option Keyboard (OK) was recently proposed as a method for transferring behavioral knowledge across tasks. OK transfers knowledge by adaptively combining subsets of known behaviors using Successor Features (SFs) and Generalized Policy Improvement (GPI). However, it relies on hand-designed state-features and task encodings which are cumbersome to design for every new environment. In this work, we propose the "Successor Features Keyboard" (SFK), which enables transfer with discovered state-features and task encodings. To enable discovery, we propose the "Categorical Successor Feature Approximator" (CSFA), a novel learning algorithm for estimating SFs while jointly discovering state-features and task encodings. With SFK and CSFA, we achieve the first demonstration of transfer with SFs in a challenging 3D environment where all the necessary representations are discovered. We first compare CSFA against other methods for approximating SFs and show that only CSFA discovers representations compatible with SF&GPI at this scale. We then compare SFK against transfer learning baselines and show that it transfers most quickly to long-horizon tasks.

What problem does this paper attempt to address?

### Problems the paper attempts to solve The paper aims to solve the problem of how to effectively transfer behavioral knowledge in deep reinforcement learning, especially how to use known behaviors to quickly adapt to new tasks when facing new, long - cycle tasks. Specifically, the paper proposes two novel methods: **Successor Features Keyboard (SFK)** and **Categorical Successor Feature Approximator (CSFA)**. #### Background problems - **Limitations of existing methods**: Existing methods such as Option Keyboard (OK) can adapt to new tasks by combining known behaviors, but rely on hand - designed state features and task encodings, which are difficult to scale in new environments. - **Dynamic query and discovery representation**: Ideally, an effective transfer learning method should be able to use dynamic queries and be able to automatically discover state features and task preferences, and share these representations among multiple tasks. #### Specific problems - **Dynamic query**: How to achieve dynamic query to adapt to new tasks without relying on hand - designed methods? - **Discovery representation**: How to automatically discover state features and task preferences in complex environments? - **Large - scale multi - task settings**: How to effectively share task encoders and successor feature estimators in large - scale multi - task settings to achieve better knowledge transfer? ### Solutions - **Successor Features Keyboard (SFK)**: By dynamically selecting linear combinations of known task behaviors, it can quickly adapt to new tasks. - **Categorical Successor Feature Approximator (CSFA)**: By introducing a new learning algorithm, it can jointly learn successor features, state features, and task preferences, thereby achieving effective transfer learning in complex environments. ### Experimental verification - **Environment**: Experiments are carried out in the complex 3D environment Playroom, which has long - cycle tasks with high - dimensional pixel observations and sparse rewards. - **Baseline methods**: It is compared with a variety of baseline methods, including Universal Successor Feature Approximators (USFA) and Modular Successor Feature Approximators (MSFA), as well as multi - task reinforcement learning (MTRL) and Distral methods. - **Results**: The experimental results show that SFK has better transfer performance in long - cycle tasks than other methods and can learn new tasks with fewer samples. ### Summary By proposing SFK and CSFA, this paper solves the problem of how to effectively transfer behavioral knowledge in complex environments, especially how to achieve dynamic query and automatic discovery representation without relying on hand - designed methods, so as to achieve rapid adaptation in long - cycle tasks.

Combining Behaviors with the Successor Features Keyboard

Composing Task Knowledge with Modular Successor Feature Approximators

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer

Learning Successor Features the Simple Way

Successor Feature Neural Episodic Control

Safety-Constrained Policy Transfer with Successor Features

Uncertainty-aware transfer across tasks using hybrid model-based successor feature reinforcement learning

SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

Planning with a Learned Policy Basis to Optimally Solve Complex Tasks

Advantages and Limitations of using Successor Features for Transfer in Reinforcement Learning

Multi-Task Reinforcement Learning in Continuous Control with Successor Feature-Based Concurrent Composition

Option Transfer and SMDP Abstraction with Successor Features

Using Memory-Based Learning to Solve Tasks with State-Action Constraints

Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning

Temporally extended successor feature neural episodic control

SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended Exploration

AKF-SR: Adaptive Kalman Filtering-based Successor Representation

Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning

Modeling Long-horizon Tasks as Sequential Interaction Landscapes

Exploration by Learning Diverse Skills through Successor State Measures