Contrastive Unsupervised Learning of World Model with Invariant Causal Features

Rudra P.K. Poudel,Harit Pandya,Roberto Cipolla
DOI: https://doi.org/10.48550/arXiv.2209.14932
2022-09-30
Abstract:In this paper we present a world model, which learns causal features using the invariance principle. In particular, we use contrastive unsupervised learning to learn the invariant causal features, which enforces invariance across augmentations of irrelevant parts or styles of the observation. The world-model-based reinforcement learning methods independently optimize representation learning and the policy. Thus naive contrastive loss implementation collapses due to a lack of supervisory signals to the representation learning module. We propose an intervention invariant auxiliary task to mitigate this issue. Specifically, we utilize depth prediction to explicitly enforce the invariance and use data augmentation as style intervention on the RGB observation space. Our design leverages unsupervised representation learning to learn the world model with invariant causal features. Our proposed method significantly outperforms current state-of-the-art model-based and model-free reinforcement learning methods on out-of-distribution point navigation tasks on the iGibson dataset. Moreover, our proposed model excels at the sim-to-real transfer of our perception learning module. Finally, we evaluate our approach on the DeepMind control suite and enforce invariance only implicitly since depth is not available. Nevertheless, our proposed model performs on par with the state-of-the-art counterpart.
Machine Learning,Artificial Intelligence,Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on the following aspects: 1. **Feature learning and policy optimization in model - based reinforcement learning**: Existing model - based reinforcement learning (MBRL) methods perform feature learning and control policy optimization independently. This may cause the feature learning module to fail due to a lack of sufficient supervision signals. Therefore, how to effectively combine these two to improve the effectiveness of MBRL is a key issue. 2. **The problem of insufficient generalization ability**: When the environment changes or there are out - of - distribution (OoD) situations, the performance of existing methods often drops significantly. To meet this challenge, the author proposes a new world model, which can enhance the generalization ability of the model through causal invariant features. 3. **The sim - to - real transfer problem**: Many models trained in simulated environments perform poorly in practical applications because they fail to adapt well to the complexity and changes in the real world. For this reason, this paper explores how to use causal invariant features to improve the transfer performance of the perception learning module. Specifically, the author introduces a world model with invariant causal features (WMC) and uses contrastive unsupervised learning to extract these features. In addition, they also design an intervention - invariant auxiliary task, such as depth prediction, to ensure that the model can learn geometric features independent of style, thereby improving its robustness and generalization ability. In this way, WMC can not only outperform the current state - of - the - art model - based and model - free reinforcement learning methods in point navigation tasks, but also perform excellently in sim - to - real perception learning transfer. Experimental results show that the proposed model has better OoD generalization ability and sim - to - real transfer performance on the iGibson dataset than other baseline models.