Leveraging Team Correlation for Approximating Equilibrium in Two-Team Zero-Sum Games

Naming Liu,Mingzhi Wang,Youzhi Zhang,Yaodong Yang,Bo An,Ying Wen
2024-03-01
Abstract:Two-team zero-sum games are one of the most important paradigms in game theory. In this paper, we focus on finding an unexploitable equilibrium in large team games. An unexploitable equilibrium is a worst-case policy, where members in the opponent team cannot increase their team reward by taking any policy, e.g., cooperatively changing to other joint policies. As an optimal unexploitable equilibrium in two-team zero-sum games, correlated-team maxmin equilibrium remains unexploitable even in the worst case where players in the opponent team can achieve arbitrary cooperation through a joint team policy. However, finding such an equilibrium in large games is challenging due to the impracticality of evaluating the exponentially large number of joint policies. To solve this problem, we first introduce a general solution concept called restricted correlated-team maxmin equilibrium, which solves the problem of being impossible to evaluate all joint policy by a sample factor while avoiding an exploitation problem under the incomplete joint policy evaluation. We then develop an efficient sequential correlation mechanism, and based on which we propose an algorithm for approximating the unexploitable equilibrium in large games. We show that our approach achieves lower exploitability than the state-of-the-art baseline when encountering opponent teams with different exploitation ability in large team games including Google Research Football.
Computer Science and Game Theory,Multiagent Systems
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to find an unexploitable equilibrium in large - scale two - team zero - sum games. Specifically, the paper focuses on finding a worst - case strategy in large - team games, that is, members in the opposing team cannot increase their team rewards by adopting any other joint strategies. For example, members of the opposing team cannot improve their payoffs by changing to other joint strategies through cooperation. ### Core challenges of the problem 1. **Computational complexity**: In large - team games, it is impractical to evaluate all possible joint strategies because the number of joint strategies grows exponentially with the team size. 2. **Limitations of existing methods**: Existing algorithms (such as Team - PSRO) cannot converge to a truly unexploitable equilibrium when dealing with large - scale team games due to restrictions on the team joint strategy space. ### Solutions proposed in the paper To address these challenges, the paper introduces a new solution concept - restricted correlated - team maxmin equilibrium (rCTME), and solves the problem in the following ways: 1. **Introducing the sample factor**: By limiting the growth rate of the number of joint strategies that need to be evaluated, the problem of exponential growth is avoided. 2. **Developing the sequential correlation mechanism**: An efficient sequential correlation mechanism is proposed, and based on this mechanism, an algorithm (S - PSRO) is designed to approximate the unexploitable equilibrium in large - scale team games. ### Formula representation - **Definition of CTME**: \[ R_j(\pi_j^*, \pi_{-j}^*) \geq R_j(\pi_j, \pi_{-j}^*) \quad \forall \pi_j \in \Pi_j \] where $\Pi_j$ is the joint strategy space of team $T_j$. - **Definition of rCTME**: \[ R_j(\pi_j^*, \pi_{-j}^*) \geq R_j(\pi_j, \pi_{-j}^*) \quad \forall \pi_j \in I_j, C_j \] where $I_j$ and $C_j$ are the individual deviation strategy space and the correlated deviation strategy space respectively. Through these methods, the paper aims to provide a more efficient and practical algorithm to find unexploitable equilibria in large - scale two - team zero - sum games.