Abstract:Offline meta reinforcement learning (OMRL) has emerged as a promising approach for interaction avoidance and strong generalization performance by leveraging pre-collected data and meta-learning techniques. Previous context-based approaches predominantly rely on the intuition that alternating optimization between the context encoder and the policy can lead to performance improvements, as long as the context encoder follows the principle of maximizing the mutual information between the task variable $M$ and its latent representation $Z$ ($I(Z;M)$) while the policy adopts the standard offline reinforcement learning (RL) algorithms conditioning on the learned task <a class="link-external link-http" href="http://representation.Despite" rel="external noopener nofollow">this http URL</a> promising results, the theoretical justification of performance improvements for such intuition remains <a class="link-external link-http" href="http://underexplored.Inspired" rel="external noopener nofollow">this http URL</a> by the return discrepancy scheme in the model-based RL field, we find that the previous optimization framework can be linked with the general RL objective of maximizing the expected return, thereby explaining performance improvements. Furthermore, after scrutinizing this optimization framework, we find it ignores the variation of the task representation in the alternating optimization process, which weakens the condition necessary for monotonic performance improvements, and may therefore violate the <a class="link-external link-http" href="http://monotonicity.We" rel="external noopener nofollow">this http URL</a> name this issue \underline{task representation shift} and theoretically prove that the monotonic performance improvements can be guaranteed with appropriate context encoder <a class="link-external link-http" href="http://updates.We" rel="external noopener nofollow">this http URL</a> use different settings to rein in the task representation shift on three widely adopted training objectives concerning maximizing $I(Z;M)$ across different data <a class="link-external link-http" href="http://qualities.Empirical" rel="external noopener nofollow">this http URL</a> results show that reining in the task representation shift can indeed improve performance.

Model-based Adversarial Meta-Reinforcement Learning

Meta-Reinforcement Learning Robust to Distributional Shift Via Performing Lifelong In-Context Learning

Meta-Reinforcement Learning with Universal Policy Adaptation: Provable Near-Optimality under All-task Optimum Comparator

Curriculum in Gradient-Based Meta-Reinforcement Learning

NoRML: No-Reward Meta Learning

Meta-Reinforcement Learning with Dynamic Adaptiveness Distillation

Constrained Meta Agnostic Reinforcement Learning

MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning

A Survey of Meta-Reinforcement Learning

MAML2: meta reinforcement learning via meta-learning for task categories

Exploration With Task Information for Meta Reinforcement Learning

Predictive value of a positive exercise stress testing and correlations with cardiovascular risk factors.

A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning

Introducing Symmetries to Black Box Meta Reinforcement Learning

Meta-Adversarial Inverse Reinforcement Learning for Decision-making Tasks

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Unsupervised Meta-Learning for Reinforcement Learning

MetaRM: Shifted Distributions Alignment via Meta-Learning

Prediction Guided Meta-Learning for Multi-Objective Reinforcement Learning

Scrutinize What We Ignore: Reining In Task Representation Shift Of Context-Based Offline Meta Reinforcement Learning

Rethinking Meta-Learning from a Learning Lens