Hindsight Planner.

Yaqing Lai,Wufan Wang,Yunjie Yang,Jihong Zhu,Minchi Kuang
DOI: https://doi.org/10.5555/3398761.3398844
2020-01-01
Abstract:Goal-oriented reinforcement learning capacitates agents to accomplish variant goals, which is crucial for robotic tasks. However, the sparse-reward setting of these tasks aggravates sample inefficiency. Hindsight Experience Replay (HER) was introduced as a technique to elevate sample efficiency by imaging hindsight virtual goals for unsuccessful trajectories, which mitigates long-term domination of negative rewards. Nevertheless, there is still a gap between the distribution of hindsight goals and desired goals of the tasks, which was narrowed by lots of aimless exploration in HER. In this paper, we propose Hindsight Planner(HP) to generate several subgoals guiding the agent to explore towards the desired goal step by step, which allows the agent to exploit its local knowledge learned from achieved goals. The planner uses history trajectories to learn the structure of feasible goal space, then generalizes its knowledge to unseen goals. We have extensively evaluated our framework on a number of robotic tasks and show substantial improvements over the original HER in terms of sample efficiency and converged performance.
What problem does this paper attempt to address?