Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU

Mohammad Babaeizadeh,Iuri Frosio,Stephen Tyree,Jason Clemons,Jan Kautz
DOI: https://doi.org/10.48550/arXiv.1611.06256
2017-03-03
Abstract:We introduce a hybrid CPU/GPU version of the Asynchronous Advantage Actor-Critic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. We analyze its computational traits and concentrate on aspects critical to leveraging the GPU's computational power. We introduce a system of queues and a dynamic scheduling strategy, potentially helpful for other asynchronous algorithms as well. Our hybrid CPU/GPU version of A3C, based on TensorFlow, achieves a significant speed up compared to a CPU implementation; we make it publicly available to other researchers at <a class="link-external link-https" href="https://github.com/NVlabs/GA3C" rel="external noopener nofollow">this https URL</a> .
Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the computational efficiency of the Asynchronous Advantage Actor - Critic (A3C) algorithm on GPU. Specifically, the author focuses on how to effectively utilize the computing power of GPU to accelerate the training process of the A3C algorithm, especially when dealing with deep reinforcement learning tasks. The paper mentions that the traditional A3C algorithm mainly runs on CPU. Although it can achieve very good results, it is not efficient in the utilization of computing resources, especially in the training of deep neural networks (DNN) which requires a large amount of computing resources. Therefore, the paper proposes a hybrid CPU/GPU implementation method of A3C, called GA3C. By optimizing the system architecture and scheduling strategy, it significantly improves the training speed and GPU utilization. ### Problems Solved in the Paper 1. **Improve Computational Efficiency**: When the traditional A3C algorithm runs on multi - core CPU, although it can achieve relatively good performance, it fails to fully utilize the powerful computing power of modern GPU. By designing a hybrid CPU/GPU architecture, the paper aims to improve computational efficiency, especially when dealing with large - scale deep neural networks. 2. **Reduce Latency**: In the A3C algorithm, the generation and use of training data have a certain order, which causes the GPU to be idle most of the time while waiting for new data. By introducing prediction queues and training queues as well as dynamic scheduling strategies, the paper reduces this latency, enabling the GPU to work more efficiently. 3. **Optimize System Resource Allocation**: The paper analyzes in detail the impact of the number of different components (such as predictors, trainers, and agents) on system performance and proposes a method to automatically adjust these parameters to find the optimal resource allocation scheme. This not only improves the training speed but also ensures the stability and convergence of the system. ### Main Contributions - **GA3C Architecture**: Proposes a new hybrid CPU/GPU architecture. By centrally managing and optimizing the prediction and training processes, it significantly improves the training speed of the A3C algorithm. - **Dynamic Scheduling Strategy**: Introduces a strategy to dynamically adjust the number of predictors, trainers, and agents to adapt to different hardware environments and task requirements, ensuring that the system can achieve the best performance under different conditions. - **Performance Evaluation**: Through detailed experiments, verifies the performance improvement of GA3C under different system configurations. Especially when dealing with large - scale deep neural networks, there is a significant speed improvement compared with the pure CPU implementation. In conclusion, by optimizing the computational architecture and scheduling strategy of the A3C algorithm, this paper solves the problem of low computational efficiency of the traditional A3C algorithm on GPU and provides a more efficient solution for deep reinforcement learning tasks.