Leveraging Transition Exploratory Bonus for Efficient Exploration in Hard-Transiting Reinforcement Learning Problems.

Shangdong Yang,Huihui Wang,Shaokang Dong,Xingguo Chen
DOI: https://doi.org/10.1016/j.future.2023.04.002
2023-01-01
Abstract:In reinforcement learning (RL), agents learn policies from spatiotemporal data generated by interaction with the environment. However, spatiotemporal data containing reward signals are often sparse, the agent only gets a reward signal when it reaches the goal state. This kind of problem is technically challenging since rewards are the basis of policy learning. Recently, the prior knowledge of the task structure has been used to solve such problems such as hierarchical learning and goal-conditioned learning. In this paper, we consider a common task structure called Hard-Transiting. In a Hard-Transiting task, the difficulty of moving forward of the agent increases as it approaches the goal state. We formalize the Hard-Transiting RL problem with sparse rewards in which the transition probability decreases as the agent approaches the goal state. Inspired by reward bonus in interactive spatiotemporal data, we propose two novel algorithms with efficient exploration to solve such problems. For tabular setting, we adopt Transition Exploratory Bonus (TEB) to encourage exploration in Hard-Transiting problems and propose Model-Based Interval Estimation-TEB (MBIE-TEB) in which TEB is considered in the value iteration phase of the conventional MBIE algorithm. And under the performance metric of sample complexity, we give a theoretical proof of the upper bound of MBIE-TEB. For non-tabular setting, we propose Deep Q-Network-TEB (DQN-TEB) in which TEB is used as the intrinsic motivation in DQN. We test the proposed algorithms on two numerical tasks and one large-scale task. And the experimental results demonstrate that with the transition exploratory bonus, the proposed algorithms outperform the compared algorithms.
What problem does this paper attempt to address?