Abstract:Deep reinforcement learning is a powerful approach to complex decision making. However, one issue that limits its practical application is its brittleness, sometimes failing to train in the presence of small changes in the environment. This work is motivated by the empirical observation that directly applying an already trained model to a related task often works remarkably well, also called zero-shot transfer. We take this practical trick one step further to consider how to systematically select good tasks to train, maximizing overall performance across a range of tasks. Given the high cost of training, it is critical to choose a small set of training tasks. The key idea behind our approach is to explicitly model the performance loss (generalization gap) incurred by transferring a trained model. We hence introduce Model-Based Transfer Learning (MBTL) for solving contextual RL problems. In this work, we model the performance loss as a simple linear function of task context similarity. Furthermore, we leverage Bayesian optimization techniques to efficiently model and estimate the unknown training performance of the task space. We theoretically show that the method exhibits regret that is sublinear in the number of training tasks and discuss conditions to further tighten regret bounds. We experimentally validate our methods using urban traffic and standard control benchmarks. Despite the conceptual simplicity, the experimental results suggest that MBTL can achieve greater performance than strong baselines, including exhaustive training on all tasks, multi-task training, and random selection of training tasks. This work lays the foundations for investigating explicit modeling of generalization, thereby enabling principled yet effective methods for contextual RL.

Fractional Transfer Learning for Deep Model-Based Reinforcement Learning

Self-Supervised Reinforcement Learning that Transfers using Random Features

Decision-Focused Model-based Reinforcement Learning for Reward Transfer

Sample-efficient multi-agent reinforcement learning with masked reconstruction

Modular Deep Q Networks for Sim-to-real Transfer of Visuo-motor Policies

TransDreamer: Reinforcement Learning with Transformer World Models

Dream to Adapt: Meta Reinforcement Learning by Latent Context Imagination and MDP Imagination

Expert-Free Online Transfer Learning in Multi-Agent Reinforcement Learning

Federated Transfer Reinforcement Learning for Autonomous Driving

DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations

Dynamic Sparse Training for Deep Reinforcement Learning

Influence-Augmented Local Simulators: A Scalable Solution for Fast Deep RL in Large Networked Systems

Model-Based Transfer Learning for Contextual Reinforcement Learning

Automaton Distillation: Neuro-Symbolic Transfer Learning for Deep Reinforcement Learning

Efficient World Models with Context-Aware Tokenization

Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

Efficient Deep Reinforcement Learning Via Adaptive Policy Transfer

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

Efficient Deep Reinforcement Learning Through Policy Transfer.

Latent Imagination Facilitates Zero-Shot Transfer in Autonomous Racing

TWIST: Teacher-Student World Model Distillation for Efficient Sim-to-Real Transfer