Contextual Policy Transfer in Meta-Reinforcement Learning via Active Learning.

Jingchi Jiang,Lian Yan,Xuehui Yu,Yi Guan
DOI: https://doi.org/10.1007/978-3-031-20309-1_31
2022-01-01
Abstract:In meta-reinforcement learning (meta-RL), agents that consider the context when transferring source policies have been shown to outperform context-free approaches. However, existing approaches require large amounts of on-policy experience to adapt to novel tasks, limiting their practicality and sample efficiency. In this paper, we jointly perform off-policy meta-RL and active learning to generate the latent context of the novel task by reusing valuable experiences from source tasks. To calculate the importance weight of source experience for adaptation, we employ maximum mean discrepancy (MMD) as the criterion to minimize the experience distribution distance between the target task and the adapted source tasks in a reproducing kernel Hilbert space (RKHS). Integrating source experiences based on active queries with a small amount of on-policy target experience, we demonstrate that the experience sampling benefits the fine-tuning of the contextual policy. Then, we incorporate it into a standard meta-RL framework and verify its effectiveness on four continuous control environments, simulated via the MuJoCo simulator.
What problem does this paper attempt to address?