Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning

Jianlan Luo,Perry Dong,Jeffrey Wu,Aviral Kumar,Xinyang Geng,Sergey Levine
2023-10-18
Abstract:The offline reinforcement learning (RL) paradigm provides a general recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. While policy constraints, conservatism, and other methods for mitigating distributional shifts have made offline reinforcement learning more effective, the continuous action setting often necessitates various approximations for applying these techniques. Many of these challenges are greatly alleviated in discrete action settings, where offline RL constraints and regularizers can often be computed more precisely or even exactly. In this paper, we propose an adaptive scheme for action quantization. We use a VQ-VAE to learn state-conditioned action quantization, avoiding the exponential blowup that comes with naïve discretization of the action space. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme. We further validate our approach on a set of challenging long-horizon complex robotic manipulation tasks in the Robomimic environment, where our discretized offline RL algorithms are able to improve upon their continuous counterparts by 2-3x. Our project page is at <a class="link-external link-https" href="https://saqrl.github.io/" rel="external noopener nofollow">this https URL</a>
Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges brought by continuous action spaces in offline reinforcement learning (RL), especially when the dataset is narrow (that is, the behaviors in the dataset are relatively deterministic, such as expert demonstration data), the approximation methods for continuous action spaces will lead to performance degradation or increased sensitivity to hyper - parameters. The authors propose an adaptive state - conditioned action quantization scheme (State - Conditioned Action Quantization, SAQ), which uses the vector - quantized variational auto - encoder (VQ - VAE) to learn action quantization under state conditions, in order to avoid the problem of exponential growth in the number of actions caused by naive discretization. This method can calculate policy constraints or conservative regularization terms in offline reinforcement learning more accurately, thereby improving the performance of offline reinforcement learning algorithms in robot skill - learning tasks. Specifically, the main contributions of the paper are as follows: 1. **Propose a practical method**: SAQ, for learning quantized action representations to improve the performance of continuous - action offline reinforcement learning methods in various robot - learning tasks. 2. **Provide a general method**: Learn state - conditioned action discretization and apply it to three offline reinforcement learning methods: Conservative Q - Learning (CQL), Implicit Q - Learning (IQL), and Behavior Regularized Actor - Critic (BRAC). 3. **Demonstrate significant performance improvement**: Especially when dealing with "narrow" datasets (such as expert data), the discretized version of each method usually shows better performance on common benchmark tasks than its continuous - action version. 4. **Verify the effectiveness of the method**: It is verified not only in standard offline reinforcement learning benchmark tests but also in complex robot manipulation tasks in the challenging Robomimic environment. The results show that compared with previous offline reinforcement learning methods, this method can significantly improve performance. Through these contributions, the paper aims to simplify the implementation difficulty of offline reinforcement learning and improve its effectiveness and stability in practical applications, especially in the field of robot learning.