Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning.

Mitsuhiko Nakamoto,Yuexiang Zhai,Anikait Singh,Max Sobol Mark,Yi Ma,Chelsea Finn,Aviral Kumar,Sergey Levine
DOI: https://doi.org/10.48550/arxiv.2303.05479
2023-01-01
Abstract:A compelling use case of offline reinforcement learning (RL) is to obtain apolicy initialization from existing datasets followed by fast onlinefine-tuning with limited interaction. However, existing offline RL methods tendto behave poorly during fine-tuning. In this paper, we devise an approach forlearning an effective initialization from offline data that also enables fastonline fine-tuning capabilities. Our approach, calibrated Q-learning (Cal-QL),accomplishes this by learning a conservative value function initialization thatunderestimates the value of the learned policy from offline data, while alsobeing calibrated, in the sense that the learned Q-values are at a reasonablescale. We refer to this property as calibration, and define it formally asproviding a lower bound on the true value function of the learned policy and anupper bound on the value of some other (suboptimal) reference policy, which maysimply be the behavior policy. We show that offline RL algorithms that learnsuch calibrated value functions lead to effective online fine-tuning, enablingus to take the benefits of offline initializations in online fine-tuning. Inpractice, Cal-QL can be implemented on top of the conservative Q learning (CQL)for offline RL within a one-line code change. Empirically, Cal-QL outperformsstate-of-the-art methods on 9/11 fine-tuning benchmark tasks that we study inthis paper. Code and video are available at https://nakamotoo.github.io/Cal-QL
What problem does this paper attempt to address?