Bridging Scenarios in Reinforcement Learning with Continuously Generated Relaying Predictive Models.

Kuo Li,Qing-Shan Jia
DOI: https://doi.org/10.1109/case49997.2022.9926635
2022-01-01
Abstract:Transfer learning is an effective way to reduce expensive interactions with the physical environment in reinforcement learning (RL). Based on the correlation between scenarios, both the prior policy and historical experiences collected in the source domain may help to accelerate policy optimization in the target domain. However, without setting proper relaying scenarios, the discrepancy between domains may lead to sub-optimal policies or even negative transfer. In this paper, we firstly propose a continuously generated relaying predictive model (CRPM), which autonomously bridges the source domain and target domain with a series of gradually modi ed relaying scenarios. Then, we experimentally show that CRPM effectively reduces interactions required for policy optimization in the target domain. Besides, we combine CRPM with model-based RL, which further improves the performance. The CRPM also helps to improve the classical model-free algorithm by considering it as a particular case of transfer learning in the same domain. Experimental results show that CRPM helps to avoid sub-optimal policies and outperforms other algorithms in both the source and target scenarios.
What problem does this paper attempt to address?