Combined Constraint on Behavior Cloning and Discriminator in Offline Reinforcement Learning

Shunya Kidera,Kosuke Shintani,Toi Tsuneda,Satoshi Yamane
DOI: https://doi.org/10.1109/access.2024.3361030
IF: 3.9
2024-02-10
IEEE Access
Abstract:In recent years, reinforcement learning (RL) has received a lot of attention because we can automatically learn optimal behavioral policies. However, since RL acquires the policy by repeatedly interacting with the environment, it is difficult to learn about realistic tasks. In recent years, there has been a lot of research on offline RL (batch RL), which does not need to interact with the environment, but learns from the accumulated experience prepared in advance. Learning does not work by applying common RL methods directly to offline RL because of a problem called distributional shift. Methods to suppress distributional shift have been actively studied in offline RL. In this study, we propose a new offline RL algorithm that adds constraints from discriminators used in Generative Adversarial Networks to the offline RL method called TD3+BC. We compare and validate the proposed method with existing methods using a benchmark for 3D robot control simulation. In TD3+BC, the constraint was tightened to suppress distribution shift, but a challenge arose when the quality of the dataset was poor, leading to difficulties in successful learning. The proposed approach addresses this issue by incorporating features to mitigate distribution shift while introducing new constraints to enable learning that is not solely dependent on the dataset's quality. This innovative strategy aims to improve accuracy even in cases where the dataset exhibits poor characteristics.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?