Scaling Up Multi-Agent Reinforcement Learning Via Graph Decomposition Invariant Network

Hongyi Fu,Jianmin Ji
DOI: https://doi.org/10.1109/ainit61980.2024.10581435
2024-01-01
Abstract:This paper focuses on the dilemma faced by largescale multi-agent systems. With the increase of agent size, the policy search space grows exponentially, reaching the task complexity that is difficult for the current mainstream multiagent algorithms to deal with, so that the optimal strategy is converged in large-scale multi-agent systems. In order to make the learning process of large-scale multi-agent tasks more smooth, many researchers have introduced the method of course learning, setting the scale from small to large courses, compared with learning training from scratch, this way can continue to inherit the knowledge learned in the smaller scale tasks, making the training smoother. However, in order to realize the multiagent course learning from small to large scale, we face the following difficulties. First, the neural network structure we designed can adapt to the observation input of agents of different scales, rather than only applicable to a fixed scale. Second, in the multi-stage course learning, due to the network's inefficient representation of multi-agent systems, the knowledge learned in small-scale courses cannot be effectively utilized in large-scale courses, and even the problem of catastrophic forgetting occurs in the neural network. Specifically, we view multi-agent systems from a graph perspective and derive the Graph Decomposition Invariant (GDI) for the first time. We prove that neural network architectures satisfying the GDI are of great help in realizing efficient course learning. We design a model architecture called GDI model suitable for multi-agent curriculum learning, which satisfies the graph decomposition invariant proposed by us. Finally, we designed small-scale to large-scale multi-agent tasks on two experimental platforms, StarCraft and Neural MMO, and conducted a large number of comparative experiments, which verified that our proposed GDI model has a great promotion effect on course learning.
What problem does this paper attempt to address?