Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization

Xiangsen Wang,Haoran Xu,Yinan Zheng,Xianyuan Zhan
2023-11-07
Abstract:Offline reinforcement learning (RL) has received considerable attention in recent years due to its attractive capability of learning policies from offline datasets without environmental interactions. Despite some success in the single-agent setting, offline multi-agent RL (MARL) remains to be a challenge. The large joint state-action space and the coupled multi-agent behaviors pose extra complexities for offline policy optimization. Most existing offline MARL studies simply apply offline data-related regularizations on individual agents, without fully considering the multi-agent system at the global level. In this work, we present OMIGA, a new offline m ulti-agent RL algorithm with implicit global-to-local v alue regularization. OMIGA provides a principled framework to convert global-level value regularization into equivalent implicit local value regularizations and simultaneously enables in-sample learning, thus elegantly bridging multi-agent value decomposition and policy learning with offline regularizations. Based on comprehensive experiments on the offline multi-agent MuJoCo and StarCraft II micro-management tasks, we show that OMIGA achieves superior performance over the state-of-the-art offline MARL methods in almost all tasks.
Machine Learning,Multiagent Systems
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively learn strategies from pre - collected datasets in offline multi - agent reinforcement learning (offline MARL) without environmental interaction. Specifically, the paper focuses on the fact that in multi - agent systems, due to the exponential growth of the joint state - action space and the coupling of multi - agent behaviors, existing methods face challenges when dealing with offline data. These challenges include: 1. **Complexity of the joint state - action space**: As the number of agents increases, the joint state - action space grows exponentially, which makes global - level regularization difficult to calculate and may lead to very sparse constraints, especially when the size and coverage of the offline dataset are limited. 2. **Limitations of local regularization**: Existing offline multi - agent reinforcement learning methods usually apply data - related regularization only at the local level without fully considering the global information of the multi - agent system. This local regularization method may be too conservative to ensure that the optimized local strategies are still optimal at the global level. 3. **Coordinated behavior and credit assignment**: Existing methods fail to well capture the coordinated behavior and credit assignment in multi - agent systems, thus affecting the effectiveness of strategy learning. To solve these problems, the paper proposes a new offline multi - agent reinforcement learning algorithm - OMIGA (Offline Multi - Agent RL with Implicit Global - to - Local Value Regularization). OMIGA converts global - level regularization into equivalent local value regularization by introducing implicit global - to - local value regularization, thereby improving the efficiency and stability of strategy learning while maintaining global information. ### Main contributions 1. **Global - to - local value regularization**: OMIGA provides a principled framework to convert global - level value regularization into implicit local value regularization, thus building a bridge between multi - agent value decomposition and strategy learning. 2. **Full in - sample learning**: OMIGA can perform full in - sample learning without querying out - of - distribution (OOD) action samples, thus improving the stability of learning. 3. **Theoretical analysis and experimental verification**: The paper proves the effectiveness of OMIGA through strict theoretical analysis and shows that OMIGA outperforms existing offline multi - agent reinforcement learning methods on almost all tasks through extensive experiments on multi - agent MuJoCo and StarCraft II micromanagement tasks. ### Experimental results The paper conducted experiments on multi - agent MuJoCo and StarCraft II micromanagement tasks. The results show that OMIGA achieves better performance than other baseline methods on multiple tasks, especially on some challenging tasks. Specifically, as shown in Table 1, OMIGA performs well in terms of both average return and standard deviation. ### Summary OMIGA effectively solves the key challenges in offline multi - agent reinforcement learning by introducing implicit global - to - local value regularization, providing a new solution for strategy learning in offline multi - agent systems.