Abstract:Offline reinforcement learning has received extensive attention from scholars because it avoids the interaction between the agent and the environment by learning a policy through a static dataset. However, general reinforcement learning methods cannot get satisfactory results in offline reinforcement learning due to the out-of-distribution state actions that the dataset cannot cover during training. To solve this problem, the policy regularization method that tries to directly clone policies used in static datasets has received numerous studies due to its simplicity and effectiveness. However, policy constraint methods make the agent choose the corresponding actions in the static dataset. This type of constraint is usually over-conservative, which results in suboptimal policies, especially in low-quality static datasets. In this paper, a hypercube policy regularization framework is proposed, this method alleviates the constraints of policy constraint methods by allowing the agent to explore the actions corresponding to similar states in the static dataset, which increases the effectiveness of algorithms in low-quality datasets. It was also theoretically demonstrated that the hypercube policy regularization framework can effectively improve the performance of original algorithms. In addition, the hypercube policy regularization framework is combined with TD3-BC and Diffusion-QL for experiments on D4RL datasets which are called TD3-BC-C and Diffusion-QL-C. The experimental results of the score demonstrate that TD3-BC-C and Diffusion-QL-C perform better than state-of-the-art algorithms like IQL, CQL, TD3-BC and Diffusion-QL in most D4RL environments in approximate time.

Combined Constraint on Behavior Cloning and Discriminator in Offline Reinforcement Learning

Offline Reinforcement Learning with Diffusion-Based Behavior Cloning Term.

Td3 with reverse kl regularizer for offline reinforcement learning from mixed datasets

When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?

Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning

Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL

Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage

Robust Offline Reinforcement Learning from Low-Quality Data

Identifying Expert Behavior in Offline Training Datasets Improves Behavioral Cloning of Robotic Manipulation Policies

SelfBC: Self Behavior Cloning for Offline Reinforcement Learning

Offline Reinforcement Learning with Behavioral Supervisor Tuning

Offline–Online Actor–Critic

Offline RL With Realistic Datasets: Heteroskedasticity and Support Constraints

State-Constrained Offline Reinforcement Learning

Goal-conditioned offline reinforcement learning through state space partitioning

Hypercube Policy Regularization Framework for Offline Reinforcement Learning

Improving Offline Reinforcement Learning with Inaccurate Simulators

Adaptive Policy Learning for Offline-to-Online Reinforcement Learning

Reining Generalization in Offline Reinforcement Learning Via Representation Distinction

Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling