A Rank-Based Sampling Framework for Offline Reinforcement Learning

Yichen Shen,Zhou Fang,Yunkun Xu,Yu Cao,Jiangcheng Zhu
DOI: https://doi.org/10.1109/cei52496.2021.9574597
2021-01-01
Abstract:Offline reinforcement learning (RL) is an attractive method that learns a policy purely from a previously collected dataset without additional interaction. However, it suffers the data quality issue that the performance of the policy is largely dependent on the previously collected dataset. In this paper, we propose a novel and robust sampling technique, Rank-Based sampling (RBS) to address this issue for offline RL. This method samples the transition with high return more frequently and reduces the influence of outliers. In this way, each transition has attached a rank of importance for non-uniform sampling. RBS approximates the optimal value of each state in offline dataset through the upper envelope neural network. Our method is compatible with many existing offline RL methods, e.g. Batch-Constrained deep Q-learning (BCQ), a hyperparameter-robust offline RL algorithm. In this paper, we provide a comprehensive experimental study in simulation environments. BCQ with RBS achieves a new state-of-the-art of performance, outperforming vanilla BCQ with uniform sampling and two other algorithms.
What problem does this paper attempt to address?