Abstract:Offline meta reinforcement learning (OMRL) has emerged as a promising approach for interaction avoidance and strong generalization performance by leveraging pre-collected data and meta-learning techniques. Previous context-based approaches predominantly rely on the intuition that alternating optimization between the context encoder and the policy can lead to performance improvements, as long as the context encoder follows the principle of maximizing the mutual information between the task variable $M$ and its latent representation $Z$ ($I(Z;M)$) while the policy adopts the standard offline reinforcement learning (RL) algorithms conditioning on the learned task <a class="link-external link-http" href="http://representation.Despite" rel="external noopener nofollow">this http URL</a> promising results, the theoretical justification of performance improvements for such intuition remains <a class="link-external link-http" href="http://underexplored.Inspired" rel="external noopener nofollow">this http URL</a> by the return discrepancy scheme in the model-based RL field, we find that the previous optimization framework can be linked with the general RL objective of maximizing the expected return, thereby explaining performance improvements. Furthermore, after scrutinizing this optimization framework, we find it ignores the variation of the task representation in the alternating optimization process, which weakens the condition necessary for monotonic performance improvements, and may therefore violate the <a class="link-external link-http" href="http://monotonicity.We" rel="external noopener nofollow">this http URL</a> name this issue \underline{task representation shift} and theoretically prove that the monotonic performance improvements can be guaranteed with appropriate context encoder <a class="link-external link-http" href="http://updates.We" rel="external noopener nofollow">this http URL</a> use different settings to rein in the task representation shift on three widely adopted training objectives concerning maximizing $I(Z;M)$ across different data <a class="link-external link-http" href="http://qualities.Empirical" rel="external noopener nofollow">this http URL</a> results show that reining in the task representation shift can indeed improve performance.

Meta-Reinforcement Learning Robust to Distributional Shift Via Performing Lifelong In-Context Learning

Meta-Reinforcement Learning with Dynamic Adaptiveness Distillation

Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling

On Context Distribution Shift in Task Representation Learning for Offline Meta RL

Offline Meta Reinforcement Learning with In-Distribution Online Adaptation

MetaRM: Shifted Distributions Alignment via Meta-Learning

Predictive value of a positive exercise stress testing and correlations with cardiovascular risk factors.

Scrutinize What We Ignore: Reining In Task Representation Shift Of Context-Based Offline Meta Reinforcement Learning

Model-based Adversarial Meta-Reinforcement Learning

Dream to Adapt: Meta Reinforcement Learning by Latent Context Imagination and MDP Imagination

A Survey of Meta-Reinforcement Learning

Stochastic Dynamic Power Dispatch with High Generalization and Few-Shot Adaption via Contextual Meta Graph Reinforcement Learning

Context Shift Reduction for Offline Meta-Reinforcement Learning

Lifelong Incremental Reinforcement Learning with Online Bayesian Inference

Reward Shaping via Meta-Learning

Data-Efficient Task Generalization via Probabilistic Model-based Meta Reinforcement Learning

Meta Reinforcement Learning with Finite Training Tasks -- a Density Estimation Approach

Meta-Inverse Reinforcement Learning with Probabilistic Context Variables

Task-Distributionally Robust Data-Free Meta-Learning.

Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL

Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning