Abstract:Offline meta reinforcement learning (OMRL) has emerged as a promising approach for interaction avoidance and strong generalization performance by leveraging pre-collected data and meta-learning techniques. Previous context-based approaches predominantly rely on the intuition that alternating optimization between the context encoder and the policy can lead to performance improvements, as long as the context encoder follows the principle of maximizing the mutual information between the task variable $M$ and its latent representation $Z$ ($I(Z;M)$) while the policy adopts the standard offline reinforcement learning (RL) algorithms conditioning on the learned task <a class="link-external link-http" href="http://representation.Despite" rel="external noopener nofollow">this http URL</a> promising results, the theoretical justification of performance improvements for such intuition remains <a class="link-external link-http" href="http://underexplored.Inspired" rel="external noopener nofollow">this http URL</a> by the return discrepancy scheme in the model-based RL field, we find that the previous optimization framework can be linked with the general RL objective of maximizing the expected return, thereby explaining performance improvements. Furthermore, after scrutinizing this optimization framework, we find it ignores the variation of the task representation in the alternating optimization process, which weakens the condition necessary for monotonic performance improvements, and may therefore violate the <a class="link-external link-http" href="http://monotonicity.We" rel="external noopener nofollow">this http URL</a> name this issue \underline{task representation shift} and theoretically prove that the monotonic performance improvements can be guaranteed with appropriate context encoder <a class="link-external link-http" href="http://updates.We" rel="external noopener nofollow">this http URL</a> use different settings to rein in the task representation shift on three widely adopted training objectives concerning maximizing $I(Z;M)$ across different data <a class="link-external link-http" href="http://qualities.Empirical" rel="external noopener nofollow">this http URL</a> results show that reining in the task representation shift can indeed improve performance.

Correcting Data Distribution Mismatch in Offline Meta-Reinforcement Learning with Few-Shot Online Adaptation

Offline Meta Reinforcement Learning with In-Distribution Online Adaptation

Meta-Reinforcement Learning Robust to Distributional Shift Via Performing Lifelong In-Context Learning

Offline Meta-Reinforcement Learning with Advantage Weighting

Adaptive Policy Learning for Offline-to-Online Reinforcement Learning

On Context Distribution Shift in Task Representation Learning for Offline Meta RL

Cost-aware Offline Safe Meta Reinforcement Learning with Robust In-Distribution Online Task Adaptation.

Sample Efficient Offline-to-Online Reinforcement Learning

Domain Adaptation for Offline Reinforcement Learning with Limited Samples

Context Shift Reduction for Offline Meta-Reinforcement Learning

Augmenting Offline RL with Unlabeled Data

Variable-Shot Adaptation for Online Meta-Learning

Robust Offline Reinforcement Learning from Low-Quality Data

Disentangling Policy from Offline Task Representation Learning via Adversarial Data Augmentation

Boosting Offline Reinforcement Learning via Data Rebalancing

Improving Generalization in Offline Reinforcement Learning Via Adversarial Data Splitting

Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation

Scrutinize What We Ignore: Reining In Task Representation Shift Of Context-Based Offline Meta Reinforcement Learning

Improving Offline Reinforcement Learning with Inaccurate Simulators

State Deviation Correction for Offline Reinforcement Learning