Pareto-Optimal Algorithms for Learning in Games

Eshwar Ram Arunachaleswaran,Natalie Collina,Jon Schneider
2024-02-15
Abstract:We study the problem of characterizing optimal learning algorithms for playing repeated games against an adversary with unknown payoffs. In this problem, the first player (called the learner) commits to a learning algorithm against a second player (called the optimizer), and the optimizer best-responds by choosing the optimal dynamic strategy for their (unknown but well-defined) payoff. Classic learning algorithms (such as no-regret algorithms) provide some counterfactual guarantees for the learner, but might perform much more poorly than other learning algorithms against particular optimizer payoffs.
Computer Science and Game Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to characterize the optimal learning algorithm when facing an opponent with unknown payoffs in repeated games. Specifically, the paper focuses on how the first player (called the learner) selects a learning algorithm to compete against the second player (called the optimizer), and the optimizer responds by choosing the dynamic strategy that is most favorable to itself. Traditionally, no - regret algorithms provide some counterfactual guarantees for the learner, but in certain optimizer payoff situations, their performance may be far worse than other learning algorithms. To this end, the paper introduces the concept of asymptotically Pareto - optimal learning algorithms. Intuitively, if a learning algorithm is Pareto - optimal, then there does not exist another algorithm that performs at least as well in front of all optimizers and strictly better (at least \( \Omega(T) \)) in front of some optimizers. The research results show that some well - known no - regret algorithms such as Multiplicative Weights and Follow The Regularized Leader (FTRL) are Pareto - dominated. However, although no - regret is not sufficient to ensure Pareto - optimality, the paper proves that a stronger property - no - swap - regret - is a sufficient condition for Pareto - optimality. ### Specific Problem Description 1. **Background and Motivation**: - In repeated games, the learner faces an optimizer with unknown payoffs. - Although traditional no - regret algorithms provide certain guarantees, they may perform poorly in some cases. 2. **Research Objectives**: - Characterize the optimal learning algorithm, especially when facing an opponent with unknown payoffs. - Introduce and study the concept of asymptotically Pareto - optimal learning algorithms. 3. **Main Contributions**: - Propose the concept of asymptotically Pareto - optimal learning algorithms. - Prove that no - swap - regret algorithms are Pareto - optimal, and many common no - regret algorithms (such as FTRL) are Pareto - dominated. - Introduce the concept of asymptotic menu for characterizing the behavior of learning algorithms. ### Definition of Asymptotically Pareto - optimal Learning Algorithms Given a fixed learner payoff \( u_L \), a learning algorithm \( A' \) asymptotically Pareto - dominates a learning algorithm \( A \) if for all optimizer payoffs \( u_O \), we have: \[ V_L(A', u_O) \geq V_L(A, u_O) \] and there exists a set of optimizer payoffs \( u_O \) with positive measure such that: \[ V_L(A', u_O) > V_L(A, u_O) \] where \( V_L(A, u_O) \) represents the asymptotic average payoff per round of the learner when using algorithm \( A \) and facing the optimizer payoff \( u_O \). A learning algorithm \( A \) is asymptotically Pareto - optimal if it is not asymptotically Pareto - dominated by any other learning algorithm. ### Main Results - Many no - regret algorithms (such as FTRL) are Pareto - dominated. - No - swap - regret algorithms are Pareto - optimal. - There are infinitely many different Pareto - optimal learning algorithms. - The asymptotic menu of no - swap - regret algorithms is unique and is a subset of the asymptotic menus of all no - regret algorithms. Through these results, the paper emphasizes the importance of no - swap - regret algorithms in strategic environments and provides a framework for designing new learning algorithms.