Abstract:This paper describes a deep reinforcement learning (DRL) approach that won Phase 1 of the Real Robot Challenge (RRC) 2021, and then extends this method to a more difficult manipulation task. The RRC consisted of using a TriFinger robot to manipulate a cube along a specified positional trajectory, but with no requirement for the cube to have any specific orientation. We used a relatively simple reward function, a combination of a goal‐based sparse reward and a distance reward, in conjunction with Hindsight Experience Replay (HER) to guide the learning of the DRL agent (Deep Deterministic Policy Gradient [DDPG]). Our approach allowed our agents to acquire dexterous robotic manipulation strategies in simulation. These strategies were then deployed on the real robot and outperformed all other competition submissions, including those using more traditional robotic control techniques, in the final evaluation stage of the RRC. Here we extend this method, by modifying the task of Phase 1 of the RRC to require the robot to maintain the cube in a particular orientation, while the cube is moved along the required positional trajectory. The requirement to also orient the cube makes the agent less able to learn the task through blind exploration due to increased problem complexity. To circumvent this issue, we make novel use of a Knowledge Transfer (KT) technique that allows the strategies learned by the agent in the original task (which was agnostic to cube orientation) to be transferred to this task (where orientation matters). KT allowed the agent to learn and perform the extended task in the simulator, which improved the average positional deviation from 0.134 to 0.02 m, and average orientation deviation from 142° to 76° during evaluation. This KT concept shows good generalization properties and could be applied to any actor‐critic learning algorithm.

Deep Reinforcement Learning for an Anthropomorphic Robotic Arm under Sparse Reward Tasks

Deep Reinforcement Learning for the Improvement of Robot Manipulation Skills under Sparse Reward

Ensemble Bootstrapped Deep Deterministic Policy Gradient For Vision-Based Robotic Grasping

Data-efficient Deep Reinforcement Learning Method Toward Scaling Continuous Robotic Task with Sparse Rewards.

Achieving Sample-Efficient Learning of Long-Horizon Sparse-Reward Robotic Tasks with Base Controllers

Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards

Efficient Hindsight Reinforcement Learning Using Demonstrations for Robotic Tasks with Sparse Rewards

Addressing Reward Engineering For Deep Reinforcement Learning On Multi-Stage Task

Learning of Long-Horizon Sparse-Reward Robotic Manipulator Tasks With Base Controllers

Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

Learning Sparse Control Tasks from Pixels by Latent Nearest-Neighbor-Guided Explorations

Path planning of robotic arm based on deep reinforcement learning algorithm

Task-Oriented Deep Reinforcement Learning for Robotic Skill Acquisition and Control

Research on Complex Robot Manipulation Tasks Based on Hindsight Trust Region Policy Optimization

Dexterous robotic manipulation using deep reinforcement learning and knowledge transfer for complex sparse reward‐based tasks

A Study on Dense and Sparse (Visual) Rewards in Robot Policy Learning

Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates

Training Musculoskeletal Arm Play Taichi with Deep Reinforcement Learning

Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward

Robotic Grasping Training Using Deep Reinforcement Learning with Policy Guidance Mechanism

Deep Reinforcement Learning Based Robot Arm Manipulation with Efficient Training Data through Simulation