Hierarchical Prototypes for Unsupervised Dynamics Generalization in Model-Based Reinforcement Learning

Jiaxian Guo,Mingming Gong,Yali Du,Zhen Wang,Dacheng Tao
2023-01-01
Abstract:By incorporating the environment-specific factor into the dynamics prediction, model-based reinforcement learning (MBRL) is able to generalise to environments with diverse dynamics.In the majority of real-world scenarios, the environment-specific factor is not observable, so existing methods attempt to estimate it from historical transition segments. Nevertheless,earlier research was unable to identify distinct clusters for environment-specific factors learned from different environments, resulting in poor performance. To address this issue, We introduce a set of environmental prototypes to represent the environmental-specified representation for each environment. By encouraging learned environment-specific factors to resemble their assigned environmental prototypes more closely, the discrimination between factors estimated from distinct environments will be enhanced. To learn such prototypes, we first construct prototypes for each sampled trajectory and then hierarchically combine trajectory prototypes with similar semantics into one environmental prototype. Experiments demonstrate that environment-specific factors estimated by our method have superior clustering performance and can consistently improve MBRL's generalisation performance in six environments consistently.
What problem does this paper attempt to address?