Modern Deep Reinforcement Learning Algorithms

Sergey Ivanov,Alexander D'yakonov
DOI: https://doi.org/10.48550/arXiv.1906.10025
2019-07-07
Abstract:Recent advances in Reinforcement Learning, grounded on combining classical theoretical results with Deep Learning paradigm, led to breakthroughs in many artificial intelligence tasks and gave birth to Deep Reinforcement Learning (DRL) as a field of research. In this work latest DRL algorithms are reviewed with a focus on their theoretical justification, practical limitations and observed empirical properties.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is a series of challenges faced by modern deep reinforcement learning algorithms in practical applications. Specifically, the paper focuses on the following aspects: 1. **Theoretical Basis and Practical Limitations**: The paper reviews the latest deep reinforcement learning algorithms and focuses on discussing the theoretical basis, practical limitations, and observed empirical properties of these algorithms. This includes how to combine classical theoretical results with the deep - learning paradigm to overcome the limitations of traditional reinforcement learning methods. 2. **Value - Function Methods**: The paper details value - function - based methods, especially Deep Q - Learning (DQN) and its improved versions, such as Double DQN, Dueling DQN, Noisy DQN, and Prioritized Experience Replay. These methods optimize the value function to guide policy selection, thereby improving learning efficiency and performance. 3. **Distribution Methods**: The paper explores the theoretical basis and practical applications of Distributional Reinforcement Learning. This method not only focuses on the expected value of the value function but also considers the entire distribution of the value function, thus providing more abundant information and helping to improve the stability and generalization ability of learning. 4. **Policy Gradient Methods**: The paper discusses policy gradient methods, including the Policy Gradient Theorem, REINFORCE, Advantage Actor - Critic (A2C), Generalized Advantage Estimation (GAE), Natural Policy Gradient (NPG), Trust Region Policy Optimization (TRPO), and Proximal Policy Optimization (PPO). These methods directly optimize the objective function, are suitable for tasks in continuous action spaces, and have high parallelization potential and scalability. 5. **Experiments and Discussions**: The paper evaluates the performance of the above - mentioned algorithms through experiments on multiple standard test environments (such as CartPole and Pong) and discusses the practical details in their practical applications. Special attention is paid to issues such as high data generation costs, large computational resource requirements, and hyper - parameter sensitivity. Overall, this paper aims to comprehensively review and analyze modern deep reinforcement learning algorithms, explore their performance in different tasks, and put forward improvement suggestions to promote the further development of this field.