Abstract:We study the problem of learning a Nash equilibrium (NE) in Markov games which is a cornerstone in multi-agent reinforcement learning (MARL). In particular, we focus on infinite-horizon adversarial team Markov games (ATMGs) in which agents that share a common reward function compete against a single opponent, the adversary. These games unify two-player zero-sum Markov games and Markov potential games, resulting in a setting that encompasses both collaboration and competition. Kalogiannis et al. (2023a) provided an efficient equilibrium computation algorithm for ATMGs which presumes knowledge of the reward and transition functions and has no sample complexity guarantees. We contribute a learning algorithm that utilizes MARL policy gradient methods with iteration and sample complexity that is polynomial in the approximation error $\epsilon$ and the natural parameters of the ATMG, resolving the main caveats of the solution by (Kalogiannis et al., 2023a). It is worth noting that previously, the existence of learning algorithms for NE was known for Markov two-player zero-sum and potential games but not for ATMGs. Seen through the lens of min-max optimization, computing a NE in these games consists a nonconvex-nonconcave saddle-point problem. Min-max optimization has received extensive study. Nevertheless, the case of nonconvex-nonconcave landscapes remains elusive: in full generality, finding saddle-points is computationally intractable (Daskalakis et al., 2021). We circumvent the aforementioned intractability by developing techniques that exploit the hidden structure of the objective function via a nonconvex-concave reformulation. However, this introduces the challenge of a feasibility set with coupled constraints. We tackle these challenges by establishing novel techniques for optimizing weakly-smooth nonconvex functions, extending the framework of (Devolder et al., 2014).

Soft-Bellman Equilibrium in Affine Markov Games: Forward Solutions and Inverse Learning

Adaptive algorithm for multi-agent learning optimal cooperative pursuit strategy based on Markov game

A Risk-Averse Equilibrium for Multi-Agent Systems

Learning Equilibria in Adversarial Team Markov Games: A Nonconvex-Hidden-Concave Min-Max Optimization Problem

Inverse Game Theory for Stackelberg Games: the Blessing of Bounded Rationality

Convex Markov Games: A Framework for Fairness, Imitation, and Creativity in Multi-Agent Learning

Bounded Risk-Sensitive Markov Games: Forward Policy Design and Inverse Reward Learning with Iterative Reasoning and Cumulative Prospect Theory

Approximate Nash Equilibrium Learning for n-Player Markov Games in Dynamic Pricing

Empirical Policy Optimization for n-Player Markov Games

Tractable Equilibrium Computation in Markov Games through Risk Aversion

Empirical Policy Optimization for <i>n</i>-Player Markov Games

Social Optimum Equilibrium Selection for Distributed Multi-Agent Optimization

Maximum-Entropy Multi-Agent Dynamic Games: Forward and Inverse Solutions

Inverse learning of black-box aggregator for robust Nash equilibrium

A Payoff-Based Learning Approach for Nash Equilibrium Seeking in Continuous Potential Games.

Rationality-bounded Adaptive Learning in Multi-Agent Dynamic Games

Deep Fictitious Play for Finding Markovian Nash Equilibrium in Multi-Agent Games

A Unified Perspective on Deep Equilibrium Finding

Bounded Rationality Equilibrium Learning in Mean Field Games

Score-Based Equilibrium Learning in Multi-Player Finite Games with Imperfect Information

Differentiable Arbitrating in Zero-sum Markov Games