Backtracking Exploration for Reinforcement Learning

Xingguo Chen,Zening Chen,Dingyuanhao Sun,Yang Gao
DOI: https://doi.org/10.1145/3627676.3627687
2023-01-01
Abstract:Exploration of the behavior policy plays an important role in reinforcement learning as it helps learning algorithms escape local optima. Taking linear value function approximation as an example, exploration directly affects the sampling of states, thereby altering the distribution of states. This distribution is a component of the key matrix, and the magnitude of the smallest eigenvalue of the key matrix is proportional to the convergence speed. However, existing exploration methods are constrained by the MDP chain and require step-by-step backtracking to reach the target policy distribution. This paper breaks the assumption that the action settings of the training environment must be identical to that of the testing environment by introducing state resetting in the training environment and proposes a backtracking exploration algorithm with time window and punishment. This algorithm can be directly combined with existing exploration strategies and value function update rules, and it has the potential to become a new paradigm for the training process in reinforcement learning. Experimental results validate the effectiveness of the proposed algorithm.
What problem does this paper attempt to address?