Curriculum Goal-Conditioned Imitation for Offline Reinforcement Learning

Xiaoyun Feng,Li Jiang,Xudong Yu,Haoran Xu,Xiaoyan Sun,Jie Wang,Xianyuan Zhan,Wai Kin Chan
DOI: https://doi.org/10.1109/tg.2022.3224088
IF: 1.237
2022-01-01
IEEE Transactions on Games
Abstract:Offline reinforcement learning (RL) enables learning policies from precollected datasets without online data collection. Although it offers the possibility to surpass the performance of the datasets, most existing offline RL algorithms struggle to compete with behavior cloning policies in many dataset settings due to trading off policy improvement and additional regularization to address the distributional shift issue. In many cases, if one can imitate a sequence of suboptimal subtrajectories in data and properly "stitch" them toward reaching an ideal future state, it may potentially result in a more reliable policy while avoiding difficulties that present in typical value-based offline RL algorithms. We borrow the idea of curriculum learning to embody the above intuition. We construct a curriculum that progressively imitates a sequence of suboptimal trajectories conditioned on a series of carefully constructed future states and cumulative rewards as goals. The suboptimal trajectories gradually guide policy learning toward reaching the ideal goal states. We name our algorithm curriculum goal-conditioned imitation (CGI). Experimental results show that CGI achieves competitive performance against state-of-the-art offline RL algorithms, especially for challenging tasks with long horizons and sparse rewards.
What problem does this paper attempt to address?