A Trajectory Perspective on the Role of Data Sampling Techniques in Offline Reinforcement Learning.
Jinyi Liu,Yi Ma,Jianye Hao,Yujing Hu,Yan Zheng,Tangjie Lv,Changjie Fan
DOI: https://doi.org/10.5555/3635637.3662980
2024-01-01
Abstract:In recent years, offline reinforcement learning (RL) algorithms have gained considerable attention. However, the role of data sampling techniques in offline RL has been somewhat overlooked, despite their potential to enhance online RL performance. Recent research in offline RL indicates that applying sampling techniques directly to state-transitions does not consistently improve performance. Therefore, to better leverage limited offline trajectory data, we investigate the impact of data sampling processes on offline RL algorithms from a trajectory perspective. In this paper, we introduce a memory technique, (Prioritized) Trajectory Replay (TR/PTR), to facilitate trajectory data storage and sampling. Building on TR, we delve into the potential of trajectory backward sampling, a method that has already proven effective in online RL, in the offline RL domain. Furthermore, to improve the sampling efficiency, we examine the influence of prioritized sampling based on various trajectory priority metrics on offline training. Integrating with existing algorithms, our findings demonstrate that data sampling and updates based on vanilla TR can contribute to more stable training. Also, our proposed 13 trajectory priority metrics for PTR exhibit outstanding performance on their respective applicable types of dataset, with the best-case scenario resulting in performance improvements exceeding 25%. These performance gains are achieved at a slight extra cost during the data sampling process, highlighting the significant advantages of trajectory-based data sampling for offline RL.