TendencyRL: Multi-stage Discriminative Hints for Efficient Goal-Oriented Reverse Curriculum Learning.

Chen Wang,Junfeng Ding,Xiangyu Chen,Zelin Ye,Jialu Wang,Ziruo Cai,Cewu Lu
DOI: https://doi.org/10.1109/iros40897.2019.8968248
2019-01-01
Abstract:Deep reinforcement learning algorithms have been proven successful in a variety of simulation tasks with dense reward feedback. However, real-world RL applications, e.g. robotic manipulation, remain challenging as most of them are multi-stage and a positive reward can only be received when the final goal is accomplished. In this work, we propose a potential solution to such problems with the introduction of an experience-based tendency reward shaping mechanism, which provides the robot with additional hints based on a discriminative learning on past experience. The reward along with a stage-awareness network help accelerate solving a multi-stage task split into shorter phases in a reverse curriculum learning manner. We extensively study the advantages of TRL on the standard long-term goal-oriented robotics domains such as pick-and-place, and show that TRL performs more efficiently and robustly than prior approaches in tasks with large state space. In addition, we demonstrate that TRL can solve difficult robot manipulation challenges directly from perception.
What problem does this paper attempt to address?