Accelerating exploration and representation learning with offline pre-training

Bogdan Mazoure,Jake Bruce,Doina Precup,Rob Fergus,Ankit Anand
2023-04-01
Abstract:Sequential decision-making agents struggle with long horizon tasks, since solving them requires multi-step reasoning. Most reinforcement learning (RL) algorithms address this challenge by improved credit assignment, introducing memory capability, altering the agent's intrinsic motivation (i.e. exploration) or its worldview (i.e. knowledge representation). Many of these components could be learned from offline data. In this work, we follow the hypothesis that exploration and representation learning can be improved by separately learning two different models from a single offline dataset. We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward separately from a single collection of human demonstrations can significantly improve the sample efficiency on the challenging NetHack benchmark. We also ablate various components of our experimental setting and highlight crucial insights.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the sample efficiency and performance of reinforcement learning (RL) algorithms in long - time - series decision - making tasks. Specifically, the paper focuses on accelerating exploration and representation learning through offline pre - training in tasks with a long - time horizon. These tasks are characterized by the need for multi - step reasoning and thus pose a challenge to reinforcement learning algorithms. The method proposed in the paper is to use a single offline data set to learn two different models respectively: one for state - representation learning and the other for auxiliary - reward modeling. This method aims to significantly improve the sample efficiency in the NetHack benchmark by combining exploration strategies and representation learning. The main contributions of the paper are as follows: 1. **Proposing a new method**: Through offline pre - training, simultaneously perform state - representation learning and auxiliary - reward modeling, thereby improving the exploration and representation capabilities of the agent in long - time - series tasks. 2. **Experimental verification**: Extensive experiments were carried out in the NetHack environment, demonstrating the effectiveness of the proposed method, especially in terms of sample - efficiency and performance improvement. 3. **Theoretical analysis**: By comparing the experimental results of different methods, the advantages of combining exploration and representation learning are shown, and it is explained why this method is particularly effective in dealing with sparse - reward tasks. Overall, the goal of the paper is to make reinforcement - learning agents more efficient and effective when facing long - time - series decision - making tasks by improving exploration and representation learning.