QFAE: Q-Function Guided Action Exploration for Offline Deep Reinforcement Learning

Teng Pang,Guoqiang Wu,Yan Zhang,Bingzheng Wang,Yilong Yin
DOI: https://doi.org/10.1016/j.patcog.2024.111032
2025-01-01
Abstract:Offline reinforcement learning (RL) expects to get an optimal policy by utilizing offline data. During policy learning, one typical method often constrains the target policy by offline data to reduce extrapolation errors. However, it can impede the learning ability of the target policy when the provided data is suboptimal. To solve this issue, we analyze the impact of action exploration on policy learning, which implies that it can improve policy learning under a suitable action perturbation. Inspired by the theoretical analysis, we propose a simple yet effective method named Q-Function guided Action Exploration (QFAE), which solves offline RL by strengthening the exploration of behavior policy with constraint perturbation action. Moreover, it can be viewed as a plug-in-play framework that can be embedded into existing policy constraint methods to improve performance. Experimental results on the D4RL illustrate the effectiveness of our method embedded into existing approaches.
What problem does this paper attempt to address?