Curriculum Offline Reinforcement Learning

Yuanying Cai,Chuheng Zhang,Hanye Zhao,Li Zhao,Jiang Bian
2023-01-01
Abstract:Offline reinforcement learning holds the promise of obtaining powerful agents from large datasets. To achieve this, a good algorithm should always benefit from (or at least does not degenerate by) adding more samples, even if the samples are not collected by expert policies. However, we observe that many popular offline RL algorithms do not possess such a property and sometimes suffers from adding heterogeneous or poor samples to the dataset. Empirically we show that, given a stage in the learning process, not all samples are useful for these algorithms. Specifically, the agent can learn more efficiently with only the samples collected by a policy similar to the current policy. This indicates that different samples may contribute to different stages of the training process, and therefore we propose Curriculum Offline Reinforcement Learning (CUORL) to equip the previous methods with the such a favorable property. In CUORL, we select the samples that are likely to be generated by the current policy to train the agent. Empirically, we show that CUORL can prevent the negative impact of adding the samples from poor policies and always improves the performance with more samples (even from random policies). Moreover, CUORL also achieves state-of-the-art performance on standard D4RL datasets, which indicates the potential of curriculum learning for offline RL.
What problem does this paper attempt to address?