Adaptability Preserving Domain Decomposition for Stabilizing Sim2Real Reinforcement Learning

Haichuan Gao,Zhile Yang,Xin Su,Tian Tan,Feng Chen
DOI: https://doi.org/10.1109/iros45743.2020.9341124
2020-01-01
Abstract:In sim-to-real transfer of Reinforcement Learning (RL) policies for robot tasks, Domain Randomization (DR) is a widely used technique for improving adaptability. However, in DR there is a conflict between adaptability and training stability, and heavy DR tends to result in instability or even failure in training. To relieve this conflict, we propose a new algorithm named Domain Decomposition (DD) that decomposes the randomized domain according to environments and trains a separate RL policy for each part. This decomposition stabilizes the training of each RL policy, and as we prove theoretically, the adaptability of the overall policy can be preserved. Our simulation results verify that DD really improves stability in training while preserving ideal adaptability. Further, we complete a complex real-world vision-based patrolling task using DD, which demonstrates DD’s practicality. A video is attached as supplementary material.
What problem does this paper attempt to address?