Model-Assisted Reinforcement Learning with Adaptive Ensemble Value Expansion

Yunkun Xu,Zhenyu Liu,Guifang Duan,Jianrong Tan
DOI: https://doi.org/10.1109/ceect53198.2021.9672626
2021-01-01
Abstract:Integrated with model-based approaches, reinforcement learning can achieve high performance with low sample complexity. However, the inaccurate learned dynamics model will degrade the performance, and the cumulative bias increases with the length of imaginary rollout. A key challenge is to improve sample efficiency without introducing significant errors. In this paper, Model-assisted Adaptive Ensemble Value Expansion (MAEVE) is proposed, which augments value expansion with imaginary training. By explicitly estimating the uncertainty of the dynamics and the value fucntion based on stochastic ensemble method, MAEVE adjusts the length of rollouts adaptively to maintain a dynamic balance between sample complexity and computational complexity. Considering the impact of the cumulative model bias on different rollout-length, MAEVE adjusts the sampling probabilities of samples at different imagination-depths instead of treating them equally. Therefore, MAEVE ensures that the learned dynamics model is only utilized if it does not introduce serious errors. Altogether, our approach significantly increases the sample efficiency compared to model-free and model-based baselines on challenging continuous control benchmarks without performance degradation.
What problem does this paper attempt to address?