Abstract:Sharing prior knowledge across multiple robotic manipulation tasks is a challenging research topic. Although the state-of-the-art deep reinforcement learning (DRL) algorithms have shown immense success in single robotic tasks, it is still challenging to extend these algorithms to be applied directly to resolve multi-task manipulation problems. This is mostly due to the problems associated with efficient exploration in high-dimensional state and continuous action spaces. Furthermore, in multi-task scenarios, the problem of sparse reward and sample inefficiency of DRL algorithms is exacerbated. Therefore, we propose a method to increase the sample efficiency of the soft actor-critic (SAC) algorithm and extend it to a multi-task setting. The agent learns a prior policy from two structurally similar tasks and adapts the policy to a target task. We propose a prioritized hindsight with dual experience replay to improve the data storage and sampling technique, which, in turn, assists the agent in performing structured exploration that leads to sample efficiency. The proposed method separates the experience replay buffer into two buffers to contain real trajectories and hindsight trajectories to reduce the bias introduced by the hindsight trajectories in the buffer. Moreover, we utilize high-reward transitions from previous tasks to assist the network in easily adapting to the new task. We demonstrate the proposed method based on several manipulation tasks using a 7-DoF robotic arm in RLBench. The experimental results show that the proposed method outperforms vanilla SAC in both a single-task setting and multi-task setting.

Improvements on Hindsight Learning

Efficient Hindsight Reinforcement Learning Using Demonstrations for Robotic Tasks with Sparse Rewards

Hindsight States: Blending Sim and Real Task Elements for Efficient Reinforcement Learning

Hindsight Planner.

Exploration via Hindsight Goal Generation

Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL

Generating Attentive Goals for Prioritized Hindsight Reinforcement Learning

Learning and reusing primitive behaviours to improve Hindsight Experience Replay sample efficiency

Consistent Experience Replay in High-Dimensional Continuous Control with Decayed Hindsights

Hindsight PRIORs for Reward Learning from Human Preferences

Hindsight-DICE: Stable Credit Assignment for Deep Reinforcement Learning

Hindsight Experience Replay Accelerates Proximal Policy Optimization

Prioritized Hindsight with Dual Buffer for Meta-Reinforcement Learning

Hindsight Trust Region Policy Optimization

Sample Efficiency in Sparse Reinforcement Learning: Or Your Money Back

ACTRCE: Augmenting Experience via Teacher's Advice For Multi-Goal Reinforcement Learning

Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning

Soft Hindsight Experience Replay

Bias-reduced Multi-step Hindsight Experience Replay for Efficient Multi-goal Reinforcement Learning

MHER: Model-based Hindsight Experience Replay

Addressing Hindsight Bias in Multigoal Reinforcement Learning