Offline Reinforcement Learning Via Optimal Transport and Improved Performance Difference Theorem

Boyi Wang,Kai Lin,Guohan Sun
DOI: https://doi.org/10.1109/ictai59109.2023.00097
2023-01-01
Abstract:Offline reinforcement learning (Offline RL) has garnered significant attention as a method for acquiring effective policy without the need for real-time interaction with the environment. However, existing approaches exhibit subpar performance due to the overestimation of out-of distribution state-action pairs. Constraining the learned policy to closely resemble the behavior policy is crucial in order to avoid generating out-of-distribution actions and misestimating their corresponding values. In this study, we introduce the optimal transmission theory to label offline datasets that lack reward labels. We investigate monotonic improvements to behavior policy by leveraging the performance difference theorem, aiming to restrict the ratio of learned policy to behavior policy, thus ensuring their proximity. Additionally, we propose a mixing loss function to address challenges inherent in the compound action setting. Building upon the aforementioned studies, we propose Optimal Transport Support Behavior Policy Optimization-Mixing Loss (OTSBPO-ML), a solution to the offline RL problem. Extensive experiments conducted on the D4RL benchmark demonstrate that OTSBPO-ML outperforms state-of-the-art offline reinforcement learning algorithms by a significant margin.
What problem does this paper attempt to address?