Cross-domain policy adaptation with dynamics alignment

Haiyuan Gui,Shanchen Pang,Shihang Yu,Sibo Qiao,Yufeng Qi,Xiao He,Min Wang,Xue Zhai
DOI: https://doi.org/10.1016/j.neunet.2023.08.025
IF: 7.8
2023-08-23
Neural Networks
Abstract:The implementation of robotic reinforcement learning is hampered by problems such as an unspecified reward function and high training costs. Many previous works have used cross-domain policy transfer to obtain the policy of the problem domain. However, these researches require paired and aligned dynamics trajectories or other interactions with the environment. We propose a cross-domain dynamics alignment framework for the problem domain policy acquisition that can transfer the policy trained in the source domain to the problem domain. Our framework aims to learn dynamics alignment across two domains that differ in agents' physical parameters (armature, rotation range, or torso mass) or agents' morphologies (limbs). Most importantly, we learn dynamics alignment between two domains using unpaired and unaligned dynamics trajectories. For these two scenarios, we propose a cross-physics-domain policy adaptation algorithm (CPD) and a cross-morphology-domain policy adaptation algorithm (CMD) based on our cross-domain dynamics alignment framework. In order to improve the performance of policy in the source domain so that a better policy can be transferred to the problem domain, we propose the Boltzmann TD3 (BTD3) algorithm. We conduct diverse experiments on agent continuous control domains to demonstrate the performance of our approaches. Experimental results show that our approaches can obtain better policies and higher rewards for the agents in the problem domains even when the dataset of the problem domain is small.
computer science, artificial intelligence,neurosciences
What problem does this paper attempt to address?