Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning

Lanqing Li,Hai Zhang,Xinyu Zhang,Shatong Zhu,Junqiao Zhao,Pheng-Ann Heng
2024-02-04
Abstract:As a marriage between offline RL and meta-RL, the advent of offline meta-reinforcement learning (OMRL) has shown great promise in enabling RL agents to multi-task and quickly adapt while acquiring knowledge safely. Among which, Context-based OMRL (COMRL) as a popular paradigm, aims to learn a universal policy conditioned on effective task representations. In this work, by examining several key milestones in the field of COMRL, we propose to integrate these seemingly independent methodologies into a unified information theoretic framework. Most importantly, we show that the pre-existing COMRL algorithms are essentially optimizing the same mutual information objective between the task variable $\boldsymbol{M}$ and its latent representation $\boldsymbol{Z}$ by implementing various approximate bounds. Based on the theoretical insight and the information bottleneck principle, we arrive at a novel algorithm dubbed UNICORN, which exhibits remarkable generalization across a broad spectrum of RL benchmarks, context shift scenarios, data qualities and deep learning architectures, attaining the new state-of-the-art. We believe that our framework could open up avenues for new optimality bounds and COMRL algorithms.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is in offline meta - reinforcement learning (OMRL), how to make the algorithm still maintain good generalization performance when facing out - of - distribution (OOD) behavior policies. Specifically, the paper focuses on how to improve the adaptability of the model in scenarios with different data qualities, model architectures, and context changes by constructing effective task representations. The authors point out that existing context - based offline meta - reinforcement learning (COMRL) methods, such as FOCAL, CORRO, and CSRO, perform poorly when facing OOD contexts, especially when there are large differences between the data collection strategy and the strategies in the training set. These problems limit the effectiveness and reliability of these methods in practical applications. To overcome these challenges, the paper proposes a unified information - theoretic framework (UNICORN), aiming to improve the generalization ability and robustness of the model by optimizing the mutual information objective \(I(Z;M)\) between the task variable \(M\) and its latent representation \(Z\). UNICORN not only integrates the advantages of existing COMRL methods but also further improves the model's adaptability to context changes by introducing the information bottleneck principle. Experimental results show that UNICORN has reached a new state - of - the - art level in multiple reinforcement learning benchmark tests, both on in - distribution and out - of - distribution data.