Abstract:A popular approach for solving zero-sum games is to maintain populations of policies to approximate the Nash Equilibrium (NE). Previous studies have shown that Policy Space Response Oracle (PSRO) algorithm is an effective multi-agent reinforcement learning framework for solving such games. However, repeatedly training new policies from scratch to approximate Best Response (BR) to opponents' mixed policies at each iteration is both inefficient and costly. While some PSRO variants initialize a new policy by inheriting from past BR policies, this approach limits the exploration of new policies, especially against challenging opponents. To address this issue, we propose Fusion-PSRO, which employs policy fusion to initialize policies for better approximation to BR. By selecting high-quality base policies from meta-NE, policy fusion fuses the base policies into a new policy through model averaging. This approach allows the initialized policies to incorporate multiple expert policies, making it easier to handle difficult opponents compared to inheriting from past BR policies or initializing from scratch. Moreover, our method only modifies the policy initialization phase, allowing its application to nearly all PSRO variants without additional training overhead. Our experiments on non-transitive matrix games, Leduc Poker, and the more complex Liars Dice demonstrate that Fusion-PSRO enhances the performance of nearly all PSRO variants, achieving lower exploitability.

What problem does this paper attempt to address?

The paper attempts to address the issues of inefficiency and insufficient exploration when training strategies in zero-sum games using multi-agent reinforcement learning frameworks such as PSRO. Specifically: 1. **Inefficiency Issue**: Traditional PSRO methods require training new strategies from scratch in each iteration to approximate the best response (BR) to the opponent's mixed strategy, which is both time-consuming and costly. 2. **Insufficient Exploration Issue**: Some PSRO variants initialize new strategies by inheriting past BR strategies, but this approach limits the exploration of new strategies, especially when facing challenging opponents. To overcome these issues, the paper proposes the Fusion-PSRO framework, which initializes new strategies through policy fusion. The specific approach involves selecting high-quality base strategies from the meta-Nash equilibrium (meta-NE) and fusing these base strategies into a new strategy through model-weighted averaging. This approach better handles difficult opponents and only modifies the strategy initialization phase without adding extra training overhead. ### Main Contributions 1. **Policy Fusion Method**: A policy fusion method based on meta-Nash equilibrium is proposed, which generates new initialization strategies by weighted averaging multiple high-quality base strategies. 2. **Performance Improvement**: Experimental results show that Fusion-PSRO can significantly reduce the exploitability of various PSRO variants and improve their performance in complex games such as non-transitive matrix games, Leduc Poker, and Liars Dice. 3. **Theoretical Analysis**: The paper theoretically analyzes the improvements in utility and strategy exploration brought by Nash-weighted average initialization. ### Experimental Validation The paper validates the effectiveness of Fusion-PSRO through experiments on benchmark tests such as non-transitive mixed games, Leduc Poker, and Liars Dice. The experimental results demonstrate that Fusion-PSRO not only achieves lower exploitability but also more comprehensively explores the strategy space in non-transitive mixed games, further supporting the theoretical analysis conclusions.

Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles

Conflux-PSRO: Effectively Leveraging Collective Advantages in Policy Space Response Oracles

Efficient Policy Space Response Oracles

Policy Space Diversity for Non-Transitive Games.

Anytime PSRO for Two-Player Zero-Sum Games

Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games

Self-adaptive PSRO: Towards an Automatic Population-based Game Solver

Efficient Double Oracle for Extensive-Form Two-Player Zero-Sum Games.

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

A-PSRO: A Unified Strategy Learning Method with Advantage Function for Normal-form Games

XDO: A Double Oracle Algorithm for Extensive-Form Games

A Generalized Training Approach for Multiagent Learning

Reinforcement Nash Equilibrium Solver

Policy Space Response Oracles: A Survey

RM-FSP: Regret Minimization Optimizes Neural Fictitious Self-Play

Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

A Unified Perspective on Deep Equilibrium Finding

Joint Policy Search for Multi-agent Collaboration with Imperfect Information

Computing Ex Ante Equilibrium in Heterogeneous Zero-Sum Team Games

Efficient Competitive Self-Play Policy Optimization

Unified Policy Optimization for Continuous-action Reinforcement Learning in Non-stationary Tasks and Games