Abstract:Offline meta reinforcement learning (OMRL) has emerged as a promising approach for interaction avoidance and strong generalization performance by leveraging pre-collected data and meta-learning techniques. Previous context-based approaches predominantly rely on the intuition that alternating optimization between the context encoder and the policy can lead to performance improvements, as long as the context encoder follows the principle of maximizing the mutual information between the task variable $M$ and its latent representation $Z$ ($I(Z;M)$) while the policy adopts the standard offline reinforcement learning (RL) algorithms conditioning on the learned task <a class="link-external link-http" href="http://representation.Despite" rel="external noopener nofollow">this http URL</a> promising results, the theoretical justification of performance improvements for such intuition remains <a class="link-external link-http" href="http://underexplored.Inspired" rel="external noopener nofollow">this http URL</a> by the return discrepancy scheme in the model-based RL field, we find that the previous optimization framework can be linked with the general RL objective of maximizing the expected return, thereby explaining performance improvements. Furthermore, after scrutinizing this optimization framework, we find it ignores the variation of the task representation in the alternating optimization process, which weakens the condition necessary for monotonic performance improvements, and may therefore violate the <a class="link-external link-http" href="http://monotonicity.We" rel="external noopener nofollow">this http URL</a> name this issue \underline{task representation shift} and theoretically prove that the monotonic performance improvements can be guaranteed with appropriate context encoder <a class="link-external link-http" href="http://updates.We" rel="external noopener nofollow">this http URL</a> use different settings to rein in the task representation shift on three widely adopted training objectives concerning maximizing $I(Z;M)$ across different data <a class="link-external link-http" href="http://qualities.Empirical" rel="external noopener nofollow">this http URL</a> results show that reining in the task representation shift can indeed improve performance.

Contextual Policy Transfer in Meta-Reinforcement Learning via Active Learning.

Meta-Reinforcement Learning Robust to Distributional Shift Via Performing Lifelong In-Context Learning

Learn to Effectively Explore in Context-Based Meta-RL

Towards Effective Context for Meta-Reinforcement Learning: an Approach Based on Contrastive Learning

Context-Based Meta-Reinforcement Learning With Bayesian Nonparametric Models

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Exploration With Task Information for Meta Reinforcement Learning

Meta-Reinforcement Learning with Dynamic Adaptiveness Distillation

Context meta-reinforcement learning via neuromodulation

Doubly Robust Augmented Transfer for Meta-Reinforcement Learning.

Guided Meta-Policy Search

Transfer reinforcement learning via meta-knowledge extraction using auto-pruned decision trees

HMRL: Hyper-Meta Learning for Sparse Reward Reinforcement Learning Problem

Meta-Learning Transferable Active Learning Policies by Deep Reinforcement Learning

Model-based Adversarial Meta-Reinforcement Learning

Model-Based Transfer Learning for Contextual Reinforcement Learning

NoRML: No-Reward Meta Learning

Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling

Scrutinize What We Ignore: Reining In Task Representation Shift Of Context-Based Offline Meta Reinforcement Learning

Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation

Enhancing Context-Based Meta-Reinforcement Learning Algorithms Via An Efficient Task Encoder (Student Abstract)