Efficient and Stable Information Directed Exploration for Continuous Reinforcement Learning

Mingzhe Chen,Xi Xiao,Wanpeng Zhang,Xiaotian Gao
DOI: https://doi.org/10.1109/icassp43922.2022.9746211
2022-01-01
Abstract:In this paper, we investigate the exploration-exploitation dilemma of reinforcement learning algorithms. We adapt the information directed sampling, an exploration framework that measures the information gain of a policy, to the continuous reinforcement learning. To stabilize the off-policy learning process and further improve the sample efficiency, we propose to use a randomized learning target and to dynamically adjust the update-to-data ratio for different parts of the neural network model. Experiments show that our approach significantly improves over existing methods and successfully completes tasks with highly sparse reward signals.
What problem does this paper attempt to address?