Abstract:Offline meta reinforcement learning (OMRL) has emerged as a promising approach for interaction avoidance and strong generalization performance by leveraging pre-collected data and meta-learning techniques. Previous context-based approaches predominantly rely on the intuition that alternating optimization between the context encoder and the policy can lead to performance improvements, as long as the context encoder follows the principle of maximizing the mutual information between the task variable $M$ and its latent representation $Z$ ($I(Z;M)$) while the policy adopts the standard offline reinforcement learning (RL) algorithms conditioning on the learned task <a class="link-external link-http" href="http://representation.Despite" rel="external noopener nofollow">this http URL</a> promising results, the theoretical justification of performance improvements for such intuition remains <a class="link-external link-http" href="http://underexplored.Inspired" rel="external noopener nofollow">this http URL</a> by the return discrepancy scheme in the model-based RL field, we find that the previous optimization framework can be linked with the general RL objective of maximizing the expected return, thereby explaining performance improvements. Furthermore, after scrutinizing this optimization framework, we find it ignores the variation of the task representation in the alternating optimization process, which weakens the condition necessary for monotonic performance improvements, and may therefore violate the <a class="link-external link-http" href="http://monotonicity.We" rel="external noopener nofollow">this http URL</a> name this issue \underline{task representation shift} and theoretically prove that the monotonic performance improvements can be guaranteed with appropriate context encoder <a class="link-external link-http" href="http://updates.We" rel="external noopener nofollow">this http URL</a> use different settings to rein in the task representation shift on three widely adopted training objectives concerning maximizing $I(Z;M)$ across different data <a class="link-external link-http" href="http://qualities.Empirical" rel="external noopener nofollow">this http URL</a> results show that reining in the task representation shift can indeed improve performance.

SplAgger: Split Aggregation for Meta-Reinforcement Learning

Meta-Reinforcement Learning Robust to Distributional Shift Via Performing Lifelong In-Context Learning

AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents

AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers

SAPG: Split and Aggregate Policy Gradients

Adaptive Aggregation for Safety-Critical Control

Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction

Curriculum in Gradient-Based Meta-Reinforcement Learning

Model-based Adversarial Meta-Reinforcement Learning

A Survey of Meta-Reinforcement Learning

Scrutinize What We Ignore: Reining In Task Representation Shift Of Context-Based Offline Meta Reinforcement Learning

RLIF: Interactive Imitation Learning as Reinforcement Learning

Deep Reinforcement Learning For Sequence to Sequence Models

Scalable Ensembling For Mitigating Reward Overoptimisation

Set-based Meta-Interpolation for Few-Task Meta-Learning

Semantically Aligned Task Decomposition in Multi-Agent Reinforcement Learning

Meta-Reinforcement Learning in Nonstationary and Nonparametric Environments

Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL

Diffused Task-Agnostic Milestone Planner

Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning