Abstract:We consider a number of questions related to tradeoffs between reward and regret in repeated gameplay between two agents. To facilitate this, we introduce a notion of $\textit{generalized equilibrium}$ which allows for asymmetric regret constraints, and yields polytopes of feasible values for each agent and pair of regret constraints, where we show that any such equilibrium is reachable by a pair of algorithms which maintain their regret guarantees against arbitrary opponents. As a central example, we highlight the case one agent is no-swap and the other's regret is unconstrained. We show that this captures an extension of $\textit{Stackelberg}$ equilibria with a matching optimal value, and that there exists a wide class of games where a player can significantly increase their utility by deviating from a no-swap-regret algorithm against a no-swap learner (in fact, almost any game without pure Nash equilibria is of this form). Additionally, we make use of generalized equilibria to consider tradeoffs in terms of the opponent's algorithm choice. We give a tight characterization for the maximal reward obtainable against $\textit{some}$ no-regret learner, yet we also show a class of games in which this is bounded away from the value obtainable against the class of common "mean-based" no-regret algorithms. Finally, we consider the question of learning reward-optimal strategies via repeated play with a no-regret agent when the game is initially unknown. Again we show tradeoffs depending on the opponent's learning algorithm: the Stackelberg strategy is learnable in exponential time with any no-regret agent (and in polynomial time with any no-$\textit{adaptive}$-regret agent) for any game where it is learnable via queries, and there are games where it is learnable in polynomial time against any no-swap-regret agent but requires exponential time against a mean-based no-regret agent.

Learning in games with continuous action sets and unknown payoff functions

On the Convergence of No-Regret Learning Dynamics in Time-Varying Games

The equivalence of dynamic and strategic stability under regularized learning in games

Convergent Learning Algorithms for Unknown Reward Games

A unified stochastic approximation framework for learning in games

Uncoupled and Convergent Learning in Monotone Games under Bandit Feedback

No-regret Learning for Repeated Non-Cooperative Games with Lossy Bandits

Learning to Control Unknown Strongly Monotone Games

No-Regret Learning in Time-Varying Zero-Sum Games

On Gradient-Based Learning in Continuous Games

Is Learning in Games Good for the Learners?

On the robustness of learning in games with stochastically perturbed payoff observations

Last-Iterate Convergence of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

Convergence of Policy Gradient Methods for Nash Equilibria in General-sum Stochastic Games

Game-theoretical control with continuous action sets

Learning to Play Against Unknown Opponents

Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback

Penalty-Regulated Dynamics and Robust Learning Procedures in Games

A Necessary and Sufficient Condition Beyond Monotonicity for Convergence of the Gradient Play in Continuous Games

Convergence of Learning Dynamics in Stackelberg Games

Survival of the strictest: Stable and unstable equilibria under regularized learning with partial information