Abstract:We consider a number of questions related to tradeoffs between reward and regret in repeated gameplay between two agents. To facilitate this, we introduce a notion of $\textit{generalized equilibrium}$ which allows for asymmetric regret constraints, and yields polytopes of feasible values for each agent and pair of regret constraints, where we show that any such equilibrium is reachable by a pair of algorithms which maintain their regret guarantees against arbitrary opponents. As a central example, we highlight the case one agent is no-swap and the other's regret is unconstrained. We show that this captures an extension of $\textit{Stackelberg}$ equilibria with a matching optimal value, and that there exists a wide class of games where a player can significantly increase their utility by deviating from a no-swap-regret algorithm against a no-swap learner (in fact, almost any game without pure Nash equilibria is of this form). Additionally, we make use of generalized equilibria to consider tradeoffs in terms of the opponent's algorithm choice. We give a tight characterization for the maximal reward obtainable against $\textit{some}$ no-regret learner, yet we also show a class of games in which this is bounded away from the value obtainable against the class of common "mean-based" no-regret algorithms. Finally, we consider the question of learning reward-optimal strategies via repeated play with a no-regret agent when the game is initially unknown. Again we show tradeoffs depending on the opponent's learning algorithm: the Stackelberg strategy is learnable in exponential time with any no-regret agent (and in polynomial time with any no-$\textit{adaptive}$-regret agent) for any game where it is learnable via queries, and there are games where it is learnable in polynomial time against any no-swap-regret agent but requires exponential time against a mean-based no-regret agent.

Learning to Play General-Sum Games against Multiple Boundedly Rational Agents

Generalized Principal-Agent Problem with a Learning Agent

A Risk-Averse Equilibrium for Multi-Agent Systems

Is Learning in Games Good for the Learners?

Uncoupled Bandit Learning towards Rationalizability: Benchmarks, Barriers, and Algorithms

Robust Cooperative Multi-Agent Reinforcement Learning:A Mean-Field Type Game Perspective

Taming Equilibrium Bias in Risk-Sensitive Multi-Agent Reinforcement Learning

Learning in Multi-Objective Public Goods Games with Non-Linear Utilities

A Generalized Training Approach for Multiagent Learning

Posterior Sampling for Multi-Agent Reinforcement Learning: Solving Extensive Games with Imperfect Information

Learning Generalizable Risk-Sensitive Policies to Coordinate in Decentralized Multi-Agent General-Sum Games

Tractable Equilibrium Computation in Markov Games through Risk Aversion

Stochastic Principal-Agent Problems: Efficient Computation and Learning

Policy Evaluation and Seeking for Multi-Agent Reinforcement Learning Via Best Response

Representation Learning for General-sum Low-rank Markov Games

Learning to Play No-Press Diplomacy with Best Response Policy Iteration

Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning

Neural Auto-Curricula in Two-Player Zero-Sum Games.

ERMAS: Becoming Robust to Reward Function Sim-to-Real Gaps in Multi-Agent Simulations

The Danger Of Arrogance: Welfare Equilibra As A Solution To Stackelberg Self-Play In Non-Coincidental Games

Bounded Rationality Equilibrium Learning in Mean Field Games