Abstract:Graph optimization problems (such as minimum vertex cover, maximum cut, travelling salesman problems) appear in many fields including social sciences, power systems, chemistry, and bioinformatics. Recently, deep reinforcement learning (DRL) has shown success in automatically learning good heuristics to solve graph optimization problems. However, the existing RL systems either do not support graph RL environments or do not support multiple or many GPUs in a distributed setting. This has compromised the ability of reinforcement learning in solving large-scale graph optimization problems due to lack of parallelization and high scalability. To address the challenges of parallelization and scalability, we develop RL4GO , a high performance distributed-GPU DRL framework for solving graph optimization problems. RL4GO focuses on a class of computationally demanding RL problems, where both RL environment and the policy model are highly computation intensive. Traditional reinforcement learning systems often assume either the RL environment is of low time-complexity or policy model is small. In this work, we distribute large-scale graphs across distributed GPUs, and use the spatial parallelism and data parallelism to achieve scalable performance. We compare and analyze the performance of the spatial parallelism and data parallelism, and show their differences. To support graph neural network (GNN) layers that take as input data samples partitioned across distributed GPUs, we design parallel mathematical kernels to perform operations on distributed 3D sparse and 3D dense tensors. To handle costly RL environments, we design a parallel graph environment to scale up all RL-environment related operations. By combining the scalable GNN layers with the scalable RL environment, we are able to develop high performance RL4GO training and inference algorithms in parallel. Furthermore, we propose two optimization techniques—replay buffer on-the-fly graph generation and adaptive multiple-node selection—to minimize the spatial cost and accelerate reinforcement learning. This work also conducts in-depth analyses of parallel efficiency and memory cost, and shows that the designed RL4GO algorithms are scalable on numerous distributed GPUs. Evaluations on large-scale graphs show that 1) RL4GO training and inference can achieve good parallel efficiency on 192 GPUs; 2) its training time can be 18 times faster than the state-of-the-art Gorila distributed RL framework [34]; and 3) its inference performance achieves a 26 times improvement over Gorila.

Scaling Up Multi-Agent Reinforcement Learning Via Graph Decomposition Invariant Network

Learning Intra-group Cooperation in Multi-agent Systems.

Multi-Agent Game Abstraction Via Graph Attention Neural Network.

From Few to More: Large-scale Dynamic Multiagent Curriculum Learning

Efficient and scalable reinforcement learning for large-scale network control

Scalable and Transferable Reinforcement Learning for Multi-Agent Mixed Cooperative–Competitive Environments Based on Hierarchical Graph Attention

Scaling Up Multiagent Reinforcement Learning for Robotic Systems: Learn an Adaptive Sparse Communication Graph

GAT-MF: Graph Attention Mean Field for Very Large Scale Multi-Agent Reinforcement Learning

A Distributed-GPU Deep Reinforcement Learning System for Solving Large Graph Optimization Problems

Very Large Scale Multi-Agent Reinforcement Learning with Graph Attention Mean Field

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

Scalability Bottlenecks in Multi-Agent Reinforcement Learning Systems

Global-localized agent graph convolution for multi-agent reinforcement learning

Group-Aware Coordination Graph for Multi-Agent Reinforcement Learning

Towards Generalizability of Multi-Agent Reinforcement Learning in Graphs with Recurrent Message Passing

Graph Neural Network-based Multi-agent Reinforcement Learning for Resilient Distributed Coordination of Multi-Robot Systems

Multi-Agent Actor-Critic with Hierarchical Graph Attention Network

Cooperative Policy Learning with Pre-trained Heterogeneous Observation Representations

Multi-Task Multi-Agent Shared Layers are Universal Cognition of Multi-Agent Coordination

Graph Policy Gradients for Large Scale Robot Control

A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking