Self-Clustering Hierarchical Multi-Agent Reinforcement Learning with Extensible Cooperation Graph

Qingxu Fu,Tenghai Qiu,Jianqiang Yi,Zhiqiang Pu,Xiaolin Ai
2024-03-27
Abstract:Multi-Agent Reinforcement Learning (MARL) has been successful in solving many cooperative challenges. However, classic non-hierarchical MARL algorithms still cannot address various complex multi-agent problems that require hierarchical cooperative behaviors. The cooperative knowledge and policies learned in non-hierarchical algorithms are implicit and not interpretable, thereby restricting the integration of existing knowledge. This paper proposes a novel hierarchical MARL model called Hierarchical Cooperation Graph Learning (HCGL) for solving general multi-agent problems. HCGL has three components: a dynamic Extensible Cooperation Graph (ECG) for achieving self-clustering cooperation; a group of graph operators for adjusting the topology of ECG; and an MARL optimizer for training these graph operators. HCGL's key distinction from other MARL models is that the behaviors of agents are guided by the topology of ECG instead of policy neural networks. ECG is a three-layer graph consisting of an agent node layer, a cluster node layer, and a target node layer. To manipulate the ECG topology in response to changing environmental conditions, four graph operators are trained to adjust the edge connections of ECG dynamically. The hierarchical feature of ECG provides a unique approach to merge primitive actions (actions executed by the agents) and cooperative actions (actions executed by the clusters) into a unified action space, allowing us to integrate fundamental cooperative knowledge into an extensible interface. In our experiments, the HCGL model has shown outstanding performance in multi-agent benchmarks with sparse rewards. We also verify that HCGL can easily be transferred to large-scale scenarios with high zero-shot transfer success rates.
Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to solve several key problems in multi - agent reinforcement learning (MARL), as follows: 1. **Large - scale multi - agent policy exploration problem**: - As the number of agents increases, the joint action space expands dramatically, making it difficult to explore sufficient feasible policies and enabling agents to break away from local sub - optimal solutions. This problem is especially serious in environments with sparse rewards. - By introducing a hierarchical control model and an extensible cooperation graph (ECG), HCGL reduces ineffective exploration behaviors, thus solving this problem. 2. **Knowledge fusion problem**: - In current research, only a few studies have explored methods of integrating existing knowledge into MARL algorithms. A framework capable of fusing knowledge is crucial for improving RL performance and is helpful for learning cooperative behaviors that go beyond the limits of RL. - HCGL realizes the fusion of cooperative knowledge by programming basic cooperative behaviors (such as gathering, joint attack, etc.) as cooperative actions in the ECG. 3. **Interpretability problem**: - In non - hierarchical frameworks, agents learn high - level policies through neural networks, and these policies are difficult to interpret. If agents fail to exhibit expected cooperative behaviors after training, it is almost impossible to diagnose problems in their policy neural networks. - As a graph structure, ECG has stronger interpretability and can easily visualize and monitor the cooperative behaviors of agents by observing the topological structure of the ECG. To solve these problems, the paper proposes a hierarchical cooperation graph learning (HCGL) model. The main contributions of this model include: - Proposing a new graph - based hierarchical HCGL model with knowledge fusion capabilities. - Providing a method for intuitively interpreting agent behaviors through the ECG topological structure. - Demonstrating the effectiveness of ECG in large - scale cooperation tasks. - Proving that HCGL can efficiently transfer policies learned in small - scale tasks to large - scale tasks. In addition, HCGL also introduces four graph operators to dynamically adjust the edge connections of the ECG to adapt to changing environmental conditions. Each graph operator has a policy neural network, aiming to maximize the rewards of the entire multi - agent team.