Neural Auto-Curricula

Xidong Feng,Oliver Slumbers,Ziyu Wan,Bo Liu,Stephen McAleer,Ying Wen,Jun Wang,Yaodong Yang

DOI: https://doi.org/10.48550/arXiv.2106.02745

2021-11-01

Abstract:When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population. Within such a process, the update rules of "who to compete with" (i.e., the opponent mixture) and "how to beat them" (i.e., finding best responses) are underpinned by manually developed game theoretical principles such as fictitious play and Double Oracle. In this paper, we introduce a novel framework -- Neural Auto-Curricula (NAC) -- that leverages meta-gradient descent to automate the discovery of the learning update rule without explicit human design. Specifically, we parameterise the opponent selection module by neural networks and the best-response module by optimisation subroutines, and update their parameters solely via interaction with the game engine, where both players aim to minimise their exploitability. Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance with the state-of-the-art population-based game solvers (e.g., PSRO) on Games of Skill, differentiable Lotto, non-transitive Mixture Games, Iterated Matching Pennies, and Kuhn Poker. Additionally, we show that NAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO on Leduc Poker. Our work inspires a promising future direction to discover general MARL algorithms solely from data.

Artificial Intelligence,Multiagent Systems

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to automatically discover effective self - training rules (auto - curricula) through the Multi - Agent Reinforcement Learning (MARL) algorithm in two - person zero - sum games. Specifically, the paper aims to develop a framework that can automatically determine the update rules of "whom to compete with" (i.e., opponent selection) and "how to defeat them" (i.e., finding the best response strategy) without relying on explicit human - designed game - theory principles. Traditional MARL methods require a great deal of human effort in designing these update rules and face challenges when dealing with large or complex games. To solve this problem, the paper introduces a new framework - Neural Auto - Curricula (NAC), which uses meta - gradient descent to automatically discover learning update rules. NAC minimizes the exploitability of each player by parameterizing the opponent - selection module and the best - response module and updating the parameters of these modules only through interaction with the game engine. Experimental results show that even without human design, the MARL algorithm discovered by NAC can achieve performance comparable to or even better than existing state - of - the - art population - based game solvers (such as PSRO) in multiple game environments. In addition, NAC also demonstrates the generalization ability from small games to large games. For example, after being trained on Kuhn Poker, it performs better than PSRO on Leduc Poker.

Neural Auto-Curricula

Neural Auto-Curricula in Two-Player Zero-Sum Games.

MARNet: Backdoor Attacks Against Cooperative Multi-Agent Reinforcement Learning

Accelerate Multi-Agent Reinforcement Learning in Zero-Sum Games with Subgame Curriculum Learning

Monte Carlo Neural Fictitious Self-Play: Achieve Approximate Nash equilibrium of Imperfect-Information Games.

MARL-LNS: Cooperative Multi-agent Reinforcement Learning via Large Neighborhoods Search

RM-FSP: Regret Minimization Optimizes Neural Fictitious Self-Play

A Generalized Training Approach for Multiagent Learning

Towards Skilled Population Curriculum for Multi-Agent Reinforcement Learning

Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

From mimic to counteract: a two-stage reinforcement learning algorithm for Google research football

ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games

Monte Carlo Neural Fictitious Self-Play: Approach to Approximate Nash equilibrium of Imperfect-Information Games

A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning

Co-Evolving Multi-Agent Transfer Reinforcement Learning Via Scenario Independent Representation

Multi-Agent Reinforcement Learning Algorithm Based On Neural Networks

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning

Modeling opponent learning in multiagent repeated games

A Learnable Noise Exploration Method for Multi-Agent Reinforcement Learning