LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework

Woojun Kim,Jeonghye Kim,Youngchul Sung
2023-10-05
Abstract:In this paper, a unified framework for exploration in reinforcement learning (RL) is proposed based on an option-critic model. The proposed framework learns to integrate a set of diverse exploration strategies so that the agent can adaptively select the most effective exploration strategy over time to realize a relevant exploration-exploitation trade-off for each given task. The effectiveness of the proposed exploration framework is demonstrated by various experiments in the MiniGrid and Atari environments.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to implement an effective exploration strategy in Reinforcement Learning (RL). Specifically, the paper points out that although many complex exploration methods have been proposed, none of them can be generally applicable to all tasks. For example, the exploration method based on intrinsic motivation performs well in some environments, but may have a negative impact in reward - dense environments; while the time - extended exploration method is superior to simple exploration strategies in solving difficult exploration tasks, but in some cases, the simple ε - greedy strategy can even perform better. Therefore, selecting the exploration strategy that is most suitable for a specific task is a time - consuming and difficult task. In addition, in different training stages of the same task, the required exploration strategy may also be different. This indicates that a single selected exploration strategy may not be optimal throughout the training process. To solve the above problems, the paper proposes a unified exploration framework named LESSON (Learning to Integrate Exploration Strategies via an Option Framework), aiming to enable the agent to adaptively select the most effective exploration strategy according to the specific situation of the task by integrating multiple different exploration strategies, thereby achieving an effective exploration - exploitation trade - off in different tasks and learning stages. LESSON achieves this goal by adopting the Option - Critic Model, and overcomes the difficulties encountered when simply applying the Option - Critic Model to exploration strategies through off - policy learning and carefully designed objective functions and action - value functions. The experimental results show that LESSON significantly outperforms existing exploration methods in MiniGrid and Atari environments.