UAC: Offline Reinforcement Learning with Uncertain Action Constraint
Jiayi Guan,Shangding Gu,Zhijun Li,Jing Hou,Yiqin Yang,Guang Chen,Changjun Jiang
DOI: https://doi.org/10.1109/tcds.2023.3287987
IF: 4.546
2024-01-01
IEEE Transactions on Cognitive and Developmental Systems
Abstract:Offline reinforcement learning (RL) algorithms promise to learn policies directly from offline datasets without environmental interaction. This arrangement enables successful RL applications in the real world, particularly in robots and autonomous driving, where sampling is costly and dangerous. However, the existing offline RL algorithms suffer from insufficient performance attributed to extrapolation error caused by out-of-distribution (OOD) actions. In this work, we propose an offline RL algorithm with an uncertain action constraint (UAC). The design principle of UAC is to minimize the extrapolation error via eliminating unknown and uncertain actions. Concretely, we first theoretically analyze the effects of different types of actions on the extrapolation error. Based on this, we propose an action-constrained strategy that exploits the uncertainty of the environmental dynamics model to eliminate unknown and uncertain actions in the Q-value evaluation process. Furthermore, the convex combination of trajectory information and Gaussian noise is novelly leveraged to enhance the generation probability of the optimal actions. Finally, we carry out the comparison and ablation experiments on the standard D4RL dataset. Experimental results indicate that UAC achieves competitive performance, especially in the field of robotic manipulation.