Abstract:Dealing with the robotic continuous control problem with sparse rewards is a longstanding challenge in deep reinforcement learning (RL). While existing DRL algorithms have demonstrated great progress in learning policies from visual observations, learning effective policies still requires an impractical number of real-world data samples. Moreover, some robotic tasks are naturally specified with sparse rewards, which makes the precious data inefficient and slows down the learning process, making DRL infeasible. In addition, manually shaping reward functions is a complex work because it needs specific domain knowledge and human intervention. To alleviate the issue, this paper proposes a model-free, off-policy RL approach named TD3MHER, to learn the manipulating policy for continuous robotic tasks with sparse rewards. To be specific, TD3MHER utilizes Twin Delayed Deep Deterministic policy gradient algorithm (TD3) and Model-driven Hindsight Experience Replay (MHER) to achieve highly sample-efficient training property. Because while the agent is learning the policy, TD3MHER could also help it to learn the potation physical model of the robot which is helpful to solve the task, and it does not necessitate any novel robot-environment interactions. The performance of TD3MHER is assessed on a simulated robotic task using a 7-DOF manipulator to compare the proposed technique to a previous DRL algorithm and to verify the usefulness of our method. Results of the experiments on simulated robotic task show that the proposed approach is capable of successfully utilizing previously store samples with sparse rewards, and obtain a faster learning speed.

Efficient Hindsight Reinforcement Learning Using Demonstrations for Robotic Tasks with Sparse Rewards

Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward

Overcoming Exploration in Reinforcement Learning with Demonstrations

Data-efficient Deep Reinforcement Learning Method Toward Scaling Continuous Robotic Task with Sparse Rewards.

Achieving Sample-Efficient Learning of Long-Horizon Sparse-Reward Robotic Tasks with Base Controllers

Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards

Deep Reinforcement Learning for an Anthropomorphic Robotic Arm under Sparse Reward Tasks

Improvements on Hindsight Learning

Learning Sparse Control Tasks from Pixels by Latent Nearest-Neighbor-Guided Explorations

Addressing Reward Engineering For Deep Reinforcement Learning On Multi-Stage Task

Residual Reinforcement Learning from Demonstrations

Sample Efficiency in Sparse Reinforcement Learning: Or Your Money Back

Hindsight States: Blending Sim and Real Task Elements for Efficient Reinforcement Learning

Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations using Generative Models

Dense Dynamics-Aware Reward Synthesis: Integrating Prior Experience with Demonstrations

Sparse Reward Based Manipulator Motion Planning by Using High Speed Learning from Demonstrations

A Study on Dense and Sparse (Visual) Rewards in Robot Policy Learning

Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning

Learning and reusing primitive behaviours to improve Hindsight Experience Replay sample efficiency

Learning of Long-Horizon Sparse-Reward Robotic Manipulator Tasks With Base Controllers