Domain Adaptation for Reinforcement Learning on the Atari

Thomas Carr,Maria Chli,George Vogiatzis
DOI: https://doi.org/10.48550/arXiv.1812.07452
2018-12-19
Abstract:Deep reinforcement learning agents have recently been successful across a variety of discrete and continuous control tasks; however, they can be slow to train and require a large number of interactions with the environment to learn a suitable policy. This is borne out by the fact that a reinforcement learning agent has no prior knowledge of the world, no pre-existing data to depend on and so must devote considerable time to exploration. Transfer learning can alleviate some of the problems by leveraging learning done on some source task to help learning on some target task. Our work presents an algorithm for initialising the hidden feature representation of the target task. We propose a domain adaptation method to transfer state representations and demonstrate transfer across domains, tasks and action spaces. We utilise adversarial domain adaptation ideas combined with an adversarial autoencoder architecture. We align our new policies' representation space with a pre-trained source policy, taking target task data generated from a random policy. We demonstrate that this initialisation step provides significant improvement when learning a new reinforcement learning task, which highlights the wide applicability of adversarial adaptation methods; even as the task and label/action space also changes.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to accelerate the learning process of Reinforcement Learning (RL) in different tasks and environments through Domain Adaptation, especially when facing different action spaces and reward structures**. ### Specific problem description 1. **High sample complexity**: Deep Reinforcement Learning (DRL) agents need a large number of environmental interactions to learn appropriate strategies during the learning process. This is because DRL agents usually learn from scratch without prior knowledge or data dependence, so they need to spend a lot of time exploring. 2. **Difficult cross - domain transfer**: When tasks span different input domains, traditional direct transfer methods may lead to performance degradation because the representation spaces between the source task and the target task are quite different and difficult to align. 3. **Initialization problem**: Randomly initialized neural network parameters will lead to low efficiency in the early stage of learning, especially when facing complex environments and large action spaces. ### Solutions proposed in the paper To solve the above problems, the author proposes a domain adaptation method based on Adversarial AutoEncoder (AAE), which specifically includes the following aspects: - **Initialization of hidden feature representations**: By initializing the hidden feature representations of the target task to a form similar to that of the source task, the learning process is accelerated. - **Adversarial domain adaptation**: Using the idea of adversarial domain adaptation and combining with the AAE architecture, the state representations of the target task are aligned with those of the source task. Specifically, through adversarial training, the embedding vectors generated by the encoder of the target task are made as close as possible to the embedding vectors of the source task. - **Unsupervised alignment**: Through adversarial training, the representation alignment between the source task and the target task is achieved without relying on labels, which solves the problem of the difference in representation spaces between different tasks. ### Experimental verification The author conducted experiments in the Atari game environment to verify the effectiveness of this method. The experimental results show that this method can significantly reduce the number of samples required to learn new tasks and can achieve better performance under different action spaces and reward structures. ### Key formulas 1. **Discounted expected reward**: \[ E_\pi\left[\sum_{t = 0}^{T}\gamma^t R(s_t,a_t)\right] \] where \(\gamma\) is the discount factor, and \(R(s_t,a_t)\) is the immediate reward for taking action \(a_t\) in state \(s_t\). 2. **Autoencoder loss function**: \[ L(X)=\frac{1}{N}\sum_{i = 1}^{N}\|x_i - AE(x_i)\|^2 \] where \(x_i\) is the input sample, and \(AE(x_i)\) is the output of the autoencoder. 3. **Generative Adversarial Network (GAN) loss function**: \[ \min_G\max_D V(D,G)=\mathbb{E}_{x\sim P_{data}(x)}[\log D(x)]+\mathbb{E}_{z\sim P_z(z)}[1 - \log D(G(z))] \] 4. **Wasserstein GAN (WGAN) loss function**: \[ \min_G\max_D V(D,G)=\mathbb{E}_{x\sim P_{data}(x)}[D(x)]-\mathbb{E}_{z\sim P_z(z)}[D(G(z))] \] Through these methods and techniques, the author successfully shows how to achieve effective knowledge transfer in different Atari games, thus accelerating the learning process of reinforcement learning agents.