Sim-to-Real Policy and Reward Transfer with Adaptive Forward Dynamics Model

R. Gomez,Keisuke Nakamura,Rongshun Juan,Jie Huang,Hao Ju,Guangliang Li
DOI: https://doi.org/10.1109/ICRA48891.2023.10161298
2023-05-29
Abstract:Deep reinforcement learning has shown promise in learning robust skills for robot control, but typically requires a large amount of samples to achieve good performance. Sim-to-real transfer learning has been developed to solve this problem, but the policy trained in simulation usually has unsatisfactory performance in the real world because simulators inevitably model the dynamics of reality imperfectly. To enable sample-efficient learning in the real world, we proposed progressive policy transfer with adaptive dynamics model (PPTADM). PPTADM assumes the dynamics of simulation and real world do not match but the state space is the same, transfers policy from simulation via progressive neural network (PNN) and further improves the policy with a learned forward dynamics model in reality. In addition, for real-world tasks in which reward functions are difficult or even impossible to define and verify the effectiveness, PPTADM can learn in real world solely from a transferred reward function that is estimated from simulation even though their dynamics do not match. Our results in five simulated tasks and on a real robot arm show that with PPTADM, the robot's learning efficiency and performance in the real world can be significantly improved.
Computer Science,Engineering
What problem does this paper attempt to address?