Uncertainty-based Bootstrapped Optimization for Offline Reinforcement Learning

Tianyi Li,Genke Yang,Jian Chu
DOI: https://doi.org/10.1007/s13042-024-02439-2
2024-01-01
International Journal of Machine Learning and Cybernetics
Abstract:Offline reinforcement learning (offline RL) promises to learn effective policies from previously-collected, static datasets without offering further possibility for exploration. However, offline RL encounters significant challenges primarily due to algorithmic difficulties arising from function approximation errors caused by extrapolating from out-of-distribution (OOD) data points. In this work, we propose uncertainty-based bootstrapped optimization (UBO), which aims to address the distributional shift induced by the fixed datasets. First, we take advantage of the bootstrapped architecture to implicitly approximate the epistemic uncertainty for the training instances. Then, we apply both the implicit and explicit penalties to the OOD data with high prediction uncertainties. Finally, we introduce a training paradigm based on the upper confidence bound (UCB) strategy for the bootstrapping updates, which enables the algorithm to thoroughly assess the varying performance of each bootstrapped head. We compare UBO with other prevailing offline RL algorithms on D4RL benchmarks. Experiments on various tasks demonstrate that the proposed algorithm can outperform or be competitive with the previous state-of-the-art on most of the tasks.
What problem does this paper attempt to address?