Abstract:We study the problem of learning a Nash equilibrium (NE) in Markov games which is a cornerstone in multi-agent reinforcement learning (MARL). In particular, we focus on infinite-horizon adversarial team Markov games (ATMGs) in which agents that share a common reward function compete against a single opponent, the adversary. These games unify two-player zero-sum Markov games and Markov potential games, resulting in a setting that encompasses both collaboration and competition. Kalogiannis et al. (2023a) provided an efficient equilibrium computation algorithm for ATMGs which presumes knowledge of the reward and transition functions and has no sample complexity guarantees. We contribute a learning algorithm that utilizes MARL policy gradient methods with iteration and sample complexity that is polynomial in the approximation error $\epsilon$ and the natural parameters of the ATMG, resolving the main caveats of the solution by (Kalogiannis et al., 2023a). It is worth noting that previously, the existence of learning algorithms for NE was known for Markov two-player zero-sum and potential games but not for ATMGs. Seen through the lens of min-max optimization, computing a NE in these games consists a nonconvex-nonconcave saddle-point problem. Min-max optimization has received extensive study. Nevertheless, the case of nonconvex-nonconcave landscapes remains elusive: in full generality, finding saddle-points is computationally intractable (Daskalakis et al., 2021). We circumvent the aforementioned intractability by developing techniques that exploit the hidden structure of the objective function via a nonconvex-concave reformulation. However, this introduces the challenge of a feasibility set with coupled constraints. We tackle these challenges by establishing novel techniques for optimizing weakly-smooth nonconvex functions, extending the framework of (Devolder et al., 2014).

A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation

A Risk-Averse Equilibrium for Multi-Agent Systems

No-Regret Learning of Nash Equilibrium for Black-Box Games via Gaussian Processes

Taming Equilibrium Bias in Risk-Sensitive Multi-Agent Reinforcement Learning

Inverse learning of black-box aggregator for robust Nash equilibrium

Meta-game equilibrium for multi-agent reinforcement learning

Asymptotic Convergence and Performance of Multi-Agent Q-Learning Dynamics

Learning Equilibria in Adversarial Team Markov Games: A Nonconvex-Hidden-Concave Min-Max Optimization Problem

Social Optimum Equilibrium Selection for Distributed Multi-Agent Optimization

When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?

Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback

Equilibrium Selection for Multi-agent Reinforcement Learning: A Unified Framework

Learning Stationary Nash Equilibrium Policies in [math]-Player Stochastic Games with Independent Chains

Algorithms in Multi-Agent Systems: A Holistic Perspective from Reinforcement Learning and Game Theory

Independent Learning in Stochastic Games

Near-Optimal Learning of Extensive-Form Games with Imperfect Information

Learning in Multi-Player Stochastic Games

Beyond Strict Competition: Approximate Convergence of Multi Agent Q-Learning Dynamics

Tractable Equilibrium Computation in Markov Games through Risk Aversion

A Generalized Training Approach for Multiagent Learning