Expert-guided Policy Optimization by Latent Space Planning with Attention

Shiqing Gao,Fufei Yao,Yaoru Sun,Haibo Shi
DOI: https://doi.org/10.1109/ictai52525.2021.00020
2021-01-01
Abstract:Planning in learned dynamics models has proven to have great potential in improving sample efficiency. However, learning the latent representation that embeds to model dynamics from high-dimensional observation is still challenging. We here propose a model-based policy gradient method, that quickly learns an optimal policy directly from pixel frames. First, the dynamics model is learned with a SENet architecture, that explicitly incorporates attention and gating mechanism to differentiate features for the latent representation. Second, the policy in the early stage is guided with successful experience to extract the intention of the experts thus to speed up the convergence. Finally, we use model rollout to decrease the value estimation bias and multi-step policy gradients to update the policy. Our approach outperforms state-of-the-art algorithms on multiple benchmark tasks in sampling efficiency and convergence performance.
What problem does this paper attempt to address?