Cooperative Open-ended Learning Framework for Zero-shot Coordination

Yang Li,Shao Zhang,Jichen Sun,Yali Du,Ying Wen,Xinbing Wang,Wei Pan
2024-02-29
Abstract:Zero-shot coordination in cooperative artificial intelligence (AI) remains a significant challenge, which means effectively coordinating with a wide range of unseen partners. Previous algorithms have attempted to address this challenge by optimizing fixed objectives within a population to improve strategy or behaviour diversity. However, these approaches can result in a loss of learning and an inability to cooperate with certain strategies within the population, known as cooperative incompatibility. To address this issue, we propose the Cooperative Open-ended LEarning (COLE) framework, which constructs open-ended objectives in cooperative games with two players from the perspective of graph theory to assess and identify the cooperative ability of each strategy. We further specify the framework and propose a practical algorithm that leverages knowledge from game theory and graph theory. Furthermore, an analysis of the learning process of the algorithm shows that it can efficiently overcome cooperative incompatibility. The experimental results in the Overcooked game environment demonstrate that our method outperforms current state-of-the-art methods when coordinating with different-level partners. Our demo is available at <a class="link-external link-https" href="https://sites.google.com/view/cole-2023" rel="external noopener nofollow">this https URL</a>.
Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is **the challenges of Zero - shot Coordination (ZSC) in Cooperative Artificial Intelligence (AI)**, that is, how to enable agents to effectively collaborate with a wide range of unseen partners. Specifically, the paper focuses on how to overcome the **cooperative incompatibility** problem existing in existing methods in multi - agent systems. ### Problem Background 1. **Limitations of Self - Play (SP)**: - Self - play iteratively improves strategies by having agents play against themselves. Although it can converge to the equilibrium state of the game, this strategy often forms specific behavior patterns and conventions, resulting in its inability to adapt to collaborating with unseen strategies. 2. **Limitations of Population - Based Training (PBT)**: - PBT breaks the conventions in self - play by maintaining a set of strategies and optimizes the rewards within the population to promote zero - shot coordination. However, when optimizing a fixed target (such as the expected reward within the population), it may lead to the fact that the cooperation ability between certain strategies is not improved synchronously, thus resulting in cooperative incompatibility. ### The Method Proposed in the Paper To solve the above problems, the paper proposes the **Cooperative Open - ended LEarning (COLE) framework**. The main contributions of the COLE framework include: 1. **Introduction of Graphic - Form Games (GFGs) and Preference Graphic - Form Games (P - GFGs)**: - Redefine cooperative tasks from the perspectives of graph theory and game theory, represent strategies as nodes in a graph, and the weight of an edge represents the cooperation gain between two strategies. P - GFG further characterizes the maximum cooperation gain of each node, helping to evaluate cooperative incompatibility. 2. **Development of the Cooperative Open - ended Learning Framework (COLE)**: - The COLE framework approximates the optimal response by iteratively generating new strategies, especially when facing the distribution of cooperative incompatibility. This framework combines the Shapley value solution method (Shapley Value) and the PageRank algorithm in graph theory to evaluate the cooperation ability and adaptability of strategies. 3. **Theoretical Guarantee**: - It is proved that the COLE framework can converge to the optimal strategy at a Q - sublinear rate when using in - degree centrality as the preference evaluation index. ### Experimental Verification The paper conducted experiments in the Overcooked game. The results show that the COLE framework outperforms the existing state - of - the - art methods (SOTA) when collaborating with partners at different levels. The experiments also show that the COLE framework can effectively overcome cooperative incompatibility. ### Summary This paper aims to solve the zero - shot coordination problem in cooperative AI. In particular, by introducing graphic - form games and the cooperative open - ended learning framework, it overcomes the cooperative incompatibility problem in existing methods.