CURLing the Dream: Contrastive Representations for World Modeling in Reinforcement Learning

Victor Augusto Kich,Jair Augusto Bottega,Raul Steinmetz,Ricardo Bedin Grando,Ayano Yorozu,Akihisa Ohya
2024-09-01
Abstract:In this work, we present Curled-Dreamer, a novel reinforcement learning algorithm that integrates contrastive learning into the DreamerV3 framework to enhance performance in visual reinforcement learning tasks. By incorporating the contrastive loss from the CURL algorithm and a reconstruction loss from autoencoder, Curled-Dreamer achieves significant improvements in various DeepMind Control Suite tasks. Our extensive experiments demonstrate that Curled-Dreamer consistently outperforms state-of-the-art algorithms, achieving higher mean and median scores across a diverse set of tasks. The results indicate that the proposed approach not only accelerates learning but also enhances the robustness of the learned policies. This work highlights the potential of combining different learning paradigms to achieve superior performance in reinforcement learning applications.
Machine Learning,Artificial Intelligence,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the challenges faced by Reinforcement Learning (RL) when dealing with high-dimensional visual inputs, particularly in terms of sample efficiency and the quality of learned representations. The authors propose a new algorithm called Curled-Dreamer, which improves the performance of visual reinforcement learning tasks by integrating Contrastive Learning into the DreamerV3 framework. Specifically, Curled-Dreamer combines the contrastive loss from the CURL algorithm and the reconstruction loss from autoencoders to enhance the encoder's ability to capture informative and discriminative features from visual inputs. Experimental results show that Curled-Dreamer performs excellently in various DeepMind Control Suite tasks, achieving higher average and median scores compared to existing methods. This not only accelerates the learning process but also improves the robustness of the policy. This work demonstrates the potential of combining different learning paradigms to achieve superior reinforcement learning performance.