Abstract:Dealing with the robotic continuous control problem with sparse rewards is a longstanding challenge in deep reinforcement learning (RL). While existing DRL algorithms have demonstrated great progress in learning policies from visual observations, learning effective policies still requires an impractical number of real-world data samples. Moreover, some robotic tasks are naturally specified with sparse rewards, which makes the precious data inefficient and slows down the learning process, making DRL infeasible. In addition, manually shaping reward functions is a complex work because it needs specific domain knowledge and human intervention. To alleviate the issue, this paper proposes a model-free, off-policy RL approach named TD3MHER, to learn the manipulating policy for continuous robotic tasks with sparse rewards. To be specific, TD3MHER utilizes Twin Delayed Deep Deterministic policy gradient algorithm (TD3) and Model-driven Hindsight Experience Replay (MHER) to achieve highly sample-efficient training property. Because while the agent is learning the policy, TD3MHER could also help it to learn the potation physical model of the robot which is helpful to solve the task, and it does not necessitate any novel robot-environment interactions. The performance of TD3MHER is assessed on a simulated robotic task using a 7-DOF manipulator to compare the proposed technique to a previous DRL algorithm and to verify the usefulness of our method. Results of the experiments on simulated robotic task show that the proposed approach is capable of successfully utilizing previously store samples with sparse rewards, and obtain a faster learning speed.

Cyclic Policy Distillation: Sample-Efficient Sim-to-Real Reinforcement Learning with Domain Randomization

Continual Domain Randomization

A Domain Data Pattern Randomization Based Deep Reinforcement Learning Method for Sim-to-Real Transfer

Real-time Policy Distillation in Deep Reinforcement Learning

Understanding Domain Randomization for Sim-to-real Transfer

Policy Transfer with Strategy Optimization

Progressive-Resolution Policy Distillation: Leveraging Coarse-Resolution Simulation for Time-Efficient Fine-Resolution Policy Learning

How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?

Sim-to-Real Policy and Reward Transfer with Adaptive Forward Dynamics Model

Experience Consistency Distillation Continual Reinforcement Learning for Robotic Manipulation Tasks

One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation

Demo: Curricular Reinforcement Learning for Robust Policy in Unmanned CarRacing Game

Adaptability Preserving Domain Decomposition for Stabilizing Sim2Real Reinforcement Learning

DROID: Minimizing the Reality Gap Using Single-Shot Human Demonstration

RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning

Variance Reduced Domain Randomization for Reinforcement Learning With Policy Gradient

Data-efficient Deep Reinforcement Learning Method Toward Scaling Continuous Robotic Task with Sparse Rewards.

Cross Domain Policy Transfer with Effect Cycle-Consistency

Cloud-Edge Training Architecture for Sim-to-Real Deep Reinforcement Learning

Evolutionary Stochastic Policy Distillation

Generating Automatic Curricula via Self-Supervised Active Domain Randomization