Adaptable Conservative Q-Learning for Offline Reinforcement Learning.

Lyn Qiu,Xu Li,Lenghan Liang,Mingming Sun,Junchi Yan
DOI: https://doi.org/10.1007/978-981-99-8435-0_16
2024-01-01
Abstract:The Out-of-Distribution (OOD) issue presents a considerable obstacle in offline reinforcement learning. Although current approaches strive to conservatively estimate the Q-values of OOD actions, their excessive conservatism under constant constraints may adversely affect model learning throughout the policy learning procedure. Moreover, the diverse task distributions across various environments and behaviors call for tailored solutions. To tackle these challenges, we propose the Adaptable Conservative Q-Learning (ACQ) method, which capitalizes on the Q-value’s distribution for each fixed dataset to devise a highly generalizable metric that strikes a balance between the conservative constraint and the training objective. Experimental outcomes reveal that ACQ not only holds its own against a variety of offline RL algorithms but also significantly improves the performance of CQL on most D4RL MuJoCo locomotion tasks in terms of normalized return.
What problem does this paper attempt to address?