Abstract:Deep reinforcement learning agents usually need to collect a large number of interactions to solve a single task. In contrast, meta-reinforcement learning (meta-RL) aims to quickly adapt to new tasks using a small amount of experience by leveraging the knowledge from training on a set of similar tasks. State-of-the-art context-based meta-RL algorithms use the context to encode the task information and train a policy conditioned on the inferred latent task encoding. However, most recent works are limited to parametric tasks, where a handful of variables control the full variation in the task distribution, and also failed to work in non-stationary environments due to the few-shot adaptation setting. To address those limitations, we propose MEta-reinforcement Learning with Task Self-discovery (MELTS), which adaptively learns qualitatively different nonparametric tasks and adapts to new tasks in a zero-shot manner. We introduce a novel deep clustering framework (DPMM-VAE) based on an infinite mixture of Gaussians, which combines the Dirichlet process mixture model (DPMM) and the variational autoencoder (VAE), to simultaneously learn task representations and cluster the tasks in a self-adaptive way. Integrating DPMM-VAE into MELTS enables it to adaptively discover the multi-modal structure of the nonparametric task distribution, which previous methods using isotropic Gaussian random variables cannot model. In addition, we propose a zero-shot adaptation mechanism and a recurrence-based context encoding strategy to improve the data efficiency and make our algorithm applicable in non-stationary environments. On various continuous control tasks with both parametric and nonparametric variations, our algorithm produces a more structured and self-adaptive task latent space and also achieves superior sample efficiency and asymptotic performance compared with state-of-the-art meta-RL algorithms.

Meta-Reinforcement Learning with Dynamic Adaptiveness Distillation

Meta-Reinforcement Learning Robust to Distributional Shift Via Performing Lifelong In-Context Learning

Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation

Model-based Adversarial Meta-Reinforcement Learning

Exploration With Task Information for Meta Reinforcement Learning

Dream to Adapt: Meta Reinforcement Learning by Latent Context Imagination and MDP Imagination

A Survey of Meta-Reinforcement Learning

Context-Based Meta-Reinforcement Learning With Bayesian Nonparametric Models

Offline Meta Reinforcement Learning with In-Distribution Online Adaptation

NoRML: No-Reward Meta Learning

Data-Efficient Task Generalization via Probabilistic Model-based Meta Reinforcement Learning

Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL

Meta-Reinforcement Learning with Universal Policy Adaptation: Provable Near-Optimality under All-task Optimum Comparator

Meta-Reinforcement Learning in Nonstationary and Nonparametric Environments

Intrinsically Guided Exploration in Meta Reinforcement Learning

Context meta-reinforcement learning via neuromodulation

MAML2: meta reinforcement learning via meta-learning for task categories

Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling

Learn to Effectively Explore in Context-Based Meta-RL

Curriculum in Gradient-Based Meta-Reinforcement Learning

Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL