Learning Future Representation with Synthetic Observations for Sample-efficient Reinforcement Learning

Xin Liu,Yaran Chen,Dongbin Zhao
2024-05-20
Abstract:In visual Reinforcement Learning (RL), upstream representation learning largely determines the effect of downstream policy learning. Employing auxiliary tasks allows the agent to enhance visual representation in a targeted manner, thereby improving the sample efficiency and performance of downstream RL. Prior advanced auxiliary tasks all focus on how to extract as much information as possible from limited experience (including observations, actions, and rewards) through their different auxiliary objectives, whereas in this article, we first start from another perspective: auxiliary training data. We try to improve auxiliary representation learning for RL by enriching auxiliary training data, proposing \textbf{L}earning \textbf{F}uture representation with \textbf{S}ynthetic observations \textbf{(LFS)}, a novel self-supervised RL approach. Specifically, we propose a training-free method to synthesize observations that may contain future information, as well as a data selection approach to eliminate unqualified synthetic noise. The remaining synthetic observations and real observations then serve as the auxiliary data to achieve a clustering-based temporal association task for representation learning. LFS allows the agent to access and learn observations that have not yet appeared in advance, so as to quickly understand and exploit them when they occur later. In addition, LFS does not rely on rewards or actions, which means it has a wider scope of application (e.g., learning from video) than recent advanced auxiliary tasks. Extensive experiments demonstrate that our LFS exhibits state-of-the-art RL sample efficiency on challenging continuous control and enables advanced visual pre-training based on action-free video demonstrations.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of low sample efficiency in representation learning in visual reinforcement learning (RL). Specifically, traditional deep reinforcement learning (DRL) algorithms require a large number of interaction steps and training time to obtain effective policies when dealing with high - dimensional inputs (such as visual observations), mainly due to the low quality of representation learning. To improve sample efficiency and the effect of downstream policy learning, this paper proposes a new self - supervised learning method - **Learning Future representation with Synthetic observations (LFS)**. This method enhances auxiliary representation learning by enriching auxiliary training data, rather than focusing on designing more complex auxiliary task goals as in previous work. #### Main problems and solutions 1. **Limitations of traditional methods**: - Traditional DRL algorithms rely on reward functions to learn feature representations and control policies simultaneously, but they have low sample efficiency when dealing with complex high - dimensional inputs (such as visual observations). - Existing high - level auxiliary tasks mainly focus on how to extract as much information as possible from limited experiences (including observations, actions, and rewards), but these methods are limited by the finiteness of experiences and may bring additional training burdens. 2. **Innovations of LFS**: - **Synthesizing future observations**: LFS enriches auxiliary training data by synthesizing observations that may contain future information. The specific method is to use frame mask, a data synthesis method without additional training, to generate novel synthetic observations. - **Data selection**: To reduce the impact of unqualified synthetic data, LFS introduces the Latent Nearest - neighbor Clip (LNC) method to eliminate noisy synthetic observations in the latent semantic space based on real experiences. - **Clustering - based spatio - temporal association tasks**: LFS uses clustering - based spatio - temporal association tasks for self - supervised representation learning without relying on reward or action information, thus supporting a wider range of application scenarios. 3. **Experimental verification**: - On multiple challenging continuous - control tasks, LFS shows higher sample efficiency than existing methods. - LFS can outperform many advanced unsupervised RL pre - training methods without pre - training. - LFS can also perform effective visual pre - training on non - expert videos, which is not achievable by existing high - level auxiliary tasks. ### Summary The main contribution of this paper is to propose a new self - supervised learning method LFS, which enhances representation learning by enriching auxiliary training data, thereby significantly improving the sample efficiency of visual reinforcement learning and demonstrating its superior performance in multiple application scenarios.