Deep Reinforcement Learning Boosted by External Knowledge

Nicolas Bougie,Ryutaro Ichise
DOI: https://doi.org/10.1145/3167132.3167165
2017-12-12
Abstract:Recent improvements in deep reinforcement learning have allowed to solve problems in many 2D domains such as Atari games. However, in complex 3D environments, numerous learning episodes are required which may be too time consuming or even impossible especially in real-world scenarios. We present a new architecture to combine external knowledge and deep reinforcement learning using only visual input. A key concept of our system is augmenting image input by adding environment feature information and combining two sources of decision. We evaluate the performances of our method in a 3D partially-observable environment from the Microsoft Malmo platform. Experimental evaluation exhibits higher performance and faster learning compared to a single reinforcement learning model.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in complex 3D environments, deep reinforcement learning (DRL) requires a large amount of training time and may not be able to effectively learn strategies, especially in partially observable environments and real - world scenarios. To solve this problem, the author proposes a new architecture to accelerate the learning process and improve performance by combining external knowledge and deep reinforcement learning. ### Specific problem description 1. **Learning efficiency problems in complex 3D environments**: - In the 2D field (such as Atari games), deep reinforcement learning has made significant progress. However, in complex 3D environments, due to the complexity and partial observability of the environment, the learning process becomes extremely time - consuming or even infeasible. - For example, in real - world 3D environments, agents need to extract features from visual inputs and make decisions, which requires a large amount of training time and computing resources. 2. **Challenges in partially observable environments**: - In partially observable environments, agents can only obtain limited information, which makes learning more difficult. For example, in a virtual environment like Minecraft, agents can only see the information within their field of vision. 3. **Limitations of existing methods**: - Existing reinforcement learning methods perform poorly when dealing with large - scale action spaces and state spaces, especially without prior knowledge. - Although the modular model (ensemble of experts) can reduce the number of actions that each expert needs to consider, it is still difficult to handle very complex environments. ### Solutions proposed in the paper The author proposes a new framework, called **DRL - EK** (Deep Reinforcement Learning Boosted by External Knowledge), aiming to enhance the performance of deep reinforcement learning models by introducing external knowledge. Specifically: - **Object recognition module**: Use deep convolutional neural networks such as YOLO to identify objects in images and generate high - level features. - **Reinforcement learning module**: Conduct policy learning based on algorithms such as A3C (Asynchronous Advantage Actor - Critic) and inject high - level features into the neural network to help the model learn faster. - **Knowledge - based decision - making module**: Use external knowledge and high - level features to select actions, making up for the deficiencies of the reinforcement learning module in the early training stage. - **Action selection module**: Synthesize the outputs of the reinforcement learning module and the knowledge - based decision - making module to select the final action. Through this architecture, the paper shows that in partially observable 3D environments, agents can learn faster and achieve higher performance. ### Summary The main goal of this paper is to solve the problem of low learning efficiency in complex 3D environments by combining external knowledge and deep reinforcement learning. The experimental results show that compared with a single reinforcement learning model, this method can achieve better performance in a shorter time.