Comparing Deep Reinforcement Learning and Evolutionary Methods in Continuous Control

Shangtong Zhang,Osmar R. Zaiane
DOI: https://doi.org/10.48550/arXiv.1712.00006
2018-03-08
Abstract:Reinforcement Learning and the Evolutionary Strategy are two major approaches in addressing complicated control problems. Both are strong contenders and have their own devotee communities. Both groups have been very active in developing new advances in their own domain and devising, in recent years, leading-edge techniques to address complex continuous control tasks. Here, in the context of Deep Reinforcement Learning, we formulate a parallelized version of the Proximal Policy Optimization method and a Deep Deterministic Policy Gradient method. Moreover, we conduct a thorough comparison between the state-of-the-art techniques in both camps fro continuous control; evolutionary methods and Deep Reinforcement Learning methods. The results show there is no consistent winner.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: to compare the performance of Deep Reinforcement Learning (DRL) and Evolutionary Methods (EM) in continuous control tasks. Specifically, the author hopes to evaluate the advantages and disadvantages of these two methods in different tasks through systematic comparative experiments, thereby providing a basis for researchers and practitioners to choose appropriate algorithms. ### Research Background and Problems 1. **Reinforcement Learning (RL)** and **Evolutionary Strategy (ES)** are the two main methods for solving complex control problems. 2. These two methods have made significant progress in their respective fields and each has its loyal supporters. 3. Although the two respectively correspond to individual learning and species evolution in nature, in computational science, there is not yet a unified framework to combine them. 4. Therefore, understanding the advantages and disadvantages of both is crucial for choosing the appropriate algorithm in different tasks. ### Main Contributions of the Paper 1. **Systematic Comparison**: The author has made a systematic comparison between the current state - of - the - art deep reinforcement learning and evolutionary methods, especially in continuous control tasks. 2. **Parallelized Implementation**: In order to make full use of modern computing resources, parallelized versions of all algorithms have been implemented. 3. **Proposal of New Algorithms**: The author has proposed parallelized Proximal Policy Optimization (PPO) and Deep Deterministic Policy Gradient (DDPG) methods and demonstrated their potential in continuous control tasks. 4. **Performance Evaluation**: Through experiments, the author has evaluated the performance of all algorithms in terms of running time and the number of environment interaction steps to obtain an empirical understanding of running speed and data efficiency. ### Experimental Setup - **Test Tasks**: Including classic toy tasks (such as the inverted pendulum), tasks requiring fine - grained exploration (such as the continuous lunar lander), and tasks involving rich dynamics (such as the biped walker). - **Performance Metrics**: Use the number of environment steps and actual time to measure performance. - **Network Structure**: Use a neural network with two hidden layers to parameterize the policy function and the value function, and test the performance of small - scale and large - scale networks. ### Results and Discussion 1. **Final Performance**: The relative final performance on different tasks shows a strong task - dependence. For example, NEAT performs well in Box2D tasks but fails in the simple inverted pendulum task; evolutionary methods perform better in tasks requiring fine - grained exploration, while deep reinforcement learning is superior in tasks involving rich dynamics. 2. **Learning Speed**: In the inverted pendulum task, deep reinforcement learning methods are superior to evolutionary methods in both the number of environment steps and actual time; while in the biped walker task, although most deep reinforcement learning methods perform well, the running speed of DDPG is slow. 3. **Stability**: Deep reinforcement learning methods have a large variance, indicating a high sensitivity to initialization and randomness, while evolutionary methods, due to better exploration capabilities, show higher stability. 4. **Scalability**: Deep reinforcement learning methods usually improve performance as the network scale increases, while evolutionary methods sometimes perform better on small - scale networks. ### Summary This paper reveals the advantages and disadvantages of deep reinforcement learning and evolutionary methods in different tasks through a systematic comparison, providing a valuable reference for future research and applications.