Abstract:Offline meta reinforcement learning (OMRL) has emerged as a promising approach for interaction avoidance and strong generalization performance by leveraging pre-collected data and meta-learning techniques. Previous context-based approaches predominantly rely on the intuition that alternating optimization between the context encoder and the policy can lead to performance improvements, as long as the context encoder follows the principle of maximizing the mutual information between the task variable $M$ and its latent representation $Z$ ($I(Z;M)$) while the policy adopts the standard offline reinforcement learning (RL) algorithms conditioning on the learned task <a class="link-external link-http" href="http://representation.Despite" rel="external noopener nofollow">this http URL</a> promising results, the theoretical justification of performance improvements for such intuition remains <a class="link-external link-http" href="http://underexplored.Inspired" rel="external noopener nofollow">this http URL</a> by the return discrepancy scheme in the model-based RL field, we find that the previous optimization framework can be linked with the general RL objective of maximizing the expected return, thereby explaining performance improvements. Furthermore, after scrutinizing this optimization framework, we find it ignores the variation of the task representation in the alternating optimization process, which weakens the condition necessary for monotonic performance improvements, and may therefore violate the <a class="link-external link-http" href="http://monotonicity.We" rel="external noopener nofollow">this http URL</a> name this issue \underline{task representation shift} and theoretically prove that the monotonic performance improvements can be guaranteed with appropriate context encoder <a class="link-external link-http" href="http://updates.We" rel="external noopener nofollow">this http URL</a> use different settings to rein in the task representation shift on three widely adopted training objectives concerning maximizing $I(Z;M)$ across different data <a class="link-external link-http" href="http://qualities.Empirical" rel="external noopener nofollow">this http URL</a> results show that reining in the task representation shift can indeed improve performance.

Enhancing Context-Based Meta-Reinforcement Learning Algorithms Via An Efficient Task Encoder (Student Abstract)

Exploration With Task Information for Meta Reinforcement Learning

Meta-Reinforcement Learning Robust to Distributional Shift Via Performing Lifelong In-Context Learning

Context-Based Meta-Reinforcement Learning With Bayesian Nonparametric Models

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Learn to Effectively Explore in Context-Based Meta-RL

Improving Context-Based Meta-Reinforcement Learning with Self-Supervised Trajectory Contrastive Learning

Leveraging the Efficiency of Multi-Task Robot Manipulation Via Task-Evoked Planner and Reinforcement Learning

Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning

Intrinsically Guided Exploration in Meta Reinforcement Learning

Dream to Adapt: Meta Reinforcement Learning by Latent Context Imagination and MDP Imagination

Scrutinize What We Ignore: Reining In Task Representation Shift Of Context-Based Offline Meta Reinforcement Learning

On Context Distribution Shift in Task Representation Learning for Offline Meta RL

Meta-Reinforcement Learning with Dynamic Adaptiveness Distillation

Context meta-reinforcement learning via neuromodulation

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning

Contextual Transformer for Offline Meta Reinforcement Learning

Meta-Reinforcement Learning in Nonstationary and Nonparametric Environments

Model-based Adversarial Meta-Reinforcement Learning

Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation