Decoupled Prioritized Resampling for Offline RL
Yang Yue,Bingyi Kang,Xiao Ma,Qisen Yang,Gao Huang,Shiji Song,Shuicheng Yan
DOI: https://doi.org/10.1109/tnnls.2024.3488358
IF: 14.255
2024-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:Offline reinforcement learning (RL) is challenged by the distributional shiftproblem. To address this problem, existing works mainly focus on designingsophisticated policy constraints between the learned policy and the behaviorpolicy. However, these constraints are applied equally to well-performing andinferior actions through uniform sampling, which might negatively affect thelearned policy. To alleviate this issue, we propose Offline PrioritizedExperience Replay (OPER), featuring a class of priority functions designed toprioritize highly-rewarding transitions, making them more frequently visitedduring training. Through theoretical analysis, we show that this class ofpriority functions induce an improved behavior policy, and when constrained tothis improved policy, a policy-constrained offline RL algorithm is likely toyield a better solution. We develop two practical strategies to obtain priorityweights by estimating advantages based on a fitted value network (OPER-A) orutilizing trajectory returns (OPER-R) for quick computation. OPER is aplug-and-play component for offline RL algorithms. As case studies, we evaluateOPER on five different algorithms, including BC, TD3+BC, Onestep RL, CQL, andIQL. Extensive experiments demonstrate that both OPER-A and OPER-Rsignificantly improve the performance for all baseline methods. Codes andpriority weights are availiable at https://github.com/sail-sg/OPER.