Abstract:We introduce a hybrid CPU/GPU version of the Asynchronous Advantage Actor-Critic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. We analyze its computational traits and concentrate on aspects critical to leveraging the GPU's computational power. We introduce a system of queues and a dynamic scheduling strategy, potentially helpful for other asynchronous algorithms as well. Our hybrid CPU/GPU version of A3C, based on TensorFlow, achieves a significant speed up compared to a CPU implementation; we make it publicly available to other researchers at <a class="link-external link-https" href="https://github.com/NVlabs/GA3C" rel="external noopener nofollow">this https URL</a> .

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to improve the computational efficiency of the Asynchronous Advantage Actor - Critic (A3C) algorithm on GPU. Specifically, the author focuses on how to effectively utilize the computing power of GPU to accelerate the training process of the A3C algorithm, especially when dealing with deep reinforcement learning tasks. The paper mentions that the traditional A3C algorithm mainly runs on CPU. Although it can achieve very good results, it is not efficient in the utilization of computing resources, especially in the training of deep neural networks (DNN) which requires a large amount of computing resources. Therefore, the paper proposes a hybrid CPU/GPU implementation method of A3C, called GA3C. By optimizing the system architecture and scheduling strategy, it significantly improves the training speed and GPU utilization. ### Problems Solved in the Paper 1. **Improve Computational Efficiency**: When the traditional A3C algorithm runs on multi - core CPU, although it can achieve relatively good performance, it fails to fully utilize the powerful computing power of modern GPU. By designing a hybrid CPU/GPU architecture, the paper aims to improve computational efficiency, especially when dealing with large - scale deep neural networks. 2. **Reduce Latency**: In the A3C algorithm, the generation and use of training data have a certain order, which causes the GPU to be idle most of the time while waiting for new data. By introducing prediction queues and training queues as well as dynamic scheduling strategies, the paper reduces this latency, enabling the GPU to work more efficiently. 3. **Optimize System Resource Allocation**: The paper analyzes in detail the impact of the number of different components (such as predictors, trainers, and agents) on system performance and proposes a method to automatically adjust these parameters to find the optimal resource allocation scheme. This not only improves the training speed but also ensures the stability and convergence of the system. ### Main Contributions - **GA3C Architecture**: Proposes a new hybrid CPU/GPU architecture. By centrally managing and optimizing the prediction and training processes, it significantly improves the training speed of the A3C algorithm. - **Dynamic Scheduling Strategy**: Introduces a strategy to dynamically adjust the number of predictors, trainers, and agents to adapt to different hardware environments and task requirements, ensuring that the system can achieve the best performance under different conditions. - **Performance Evaluation**: Through detailed experiments, verifies the performance improvement of GA3C under different system configurations. Especially when dealing with large - scale deep neural networks, there is a significant speed improvement compared with the pure CPU implementation. In conclusion, by optimizing the computational architecture and scheduling strategy of the A3C algorithm, this paper solves the problem of low computational efficiency of the traditional A3C algorithm on GPU and provides a more efficient solution for deep reinforcement learning tasks.

Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU

Efficient Parallel Methods for Deep Reinforcement Learning

Towards Understanding Asynchronous Advantage Actor-critic: Convergence and Linear Speedup

Asynchronous Methods for Deep Reinforcement Learning

A FPGA Accelerator of Distributed A3C Algorithm with Optimal Resource Deployment

Asynchronous Advantage Actor-Critic Agent for Starcraft II

Playing First-Person-Shooter Games with A3C-Anticipator Network Based Agents Using Reinforcement Learning.

Optimal Elevator Group Control via Deep Asynchronous Actor–Critic Learning

ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages

A Framework for Mapping DRL Algorithms with Prioritized Replay Buffer onto Heterogeneous Platforms

Double A3C: Deep Reinforcement Learning on OpenAI Gym Games

Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach

Agent Modeling as Auxiliary Task for Deep Reinforcement Learning

Efficient Data-Parallel Continual Learning with Asynchronous Distributed Rehearsal Buffers

Applying Online Expert Supervision in Deep Actor-Critic Reinforcement Learning.

GMI-DRL: Empowering Multi-GPU Deep Reinforcement Learning with GPU Spatial Multiplexing

Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning

Implementation of value based curiosity mechanism in Reinforcement Learning algorithm based on A3C

Recursive Least Squares Advantage Actor-Critic Algorithms

Gpu-Accelerated Extended Classifier System

Asynchronous learning for actor-critic neural networks and synchronous triggering for multiplayer system