Abstract:Recent breakthrough results by Dagan, Daskalakis, Fishelson and Golowich [2023] and Peng and Rubinstein [2023] established an efficient algorithm attaining at most $\epsilon$ swap regret over extensive-form strategy spaces of dimension $N$ in $N^{\tilde O(1/\epsilon)}$ rounds. On the other extreme, Farina and Pipis [2023] developed an efficient algorithm for minimizing the weaker notion of linear-swap regret in $\mathsf{poly}(N)/\epsilon^2$ rounds. In this paper, we take a step toward bridging the gap between those two results. We introduce the set of $k$-mediator deviations, which generalize the untimed communication deviations recently introduced by Zhang, Farina and Sandholm [2024] to the case of having multiple mediators. We develop parameterized algorithms for minimizing the regret with respect to this set of deviations in $N^{O(k)}/\epsilon^2$ rounds. This closes the gap in the sense that $k=1$ recovers linear swap regret, while $k=N$ recovers swap regret. Moreover, by relating $k$-mediator deviations to low-degree polynomials, we show that regret minimization against degree-$k$ polynomial swap deviations is achievable in $N^{O(kd)^3}/\epsilon^2$ rounds, where $d$ is the depth of the game, assuming constant branching factor. For a fixed degree $k$, this is polynomial for Bayesian games and quasipolynomial more broadly when $d = \mathsf{polylog} N$ -- the usual balancedness assumption on the game tree.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively minimize the Φ - regret related to low - degree swap deviations in extensive - form games. Specifically, the paper focuses on finding a method that can calculate approximate correlated equilibria within a reasonable time in games with a large number of pure strategies. In previous research, there have been some algorithms that can handle linear - swap regret, but for more general swap regret, especially the regret involving low - degree polynomials, there is still a lack of effective solutions. ### Main contributions of the paper 1. **Introduction of k - mediator deviation**: This is a new type of deviation, which generalizes the untimed communication deviations recently proposed by Zhang, Farina and Sandholm, and is applicable to the situation of multiple mediators. 2. **Development of parameterized algorithms**: These algorithms can minimize the regret regarding k - mediator deviation within $ N^{O(k)} / \epsilon^2 $ rounds. When $ k = 1 $, this is equivalent to linear - swap regret; when $ k = N $, this is equivalent to full swap regret. 3. **Establishment of the connection between low - degree polynomial deviation and low - depth decision trees**: Through this connection, it is proved that under certain conditions, minimizing low - degree polynomial swap regret is feasible, and specific complexity analysis is provided. 4. **Proposal of the concept of expected fixed point**: This is a relaxation of the traditional fixed - point concept, which significantly reduces the computational complexity. The author shows how to calculate the fixed point in the expected sense, thereby avoiding the need to solve linear systems and improving computational efficiency. ### Specific problem description In extensive - form games, the action strategy space of players is very large, making it extremely difficult to directly calculate the correlated equilibrium. Traditional fixed - point methods are often PPAD - hard (i.e., the computational complexity is very high) in this case. Therefore, the paper proposes a new method to bypass this problem. By introducing the expected fixed point and k - mediator deviation, the computational complexity is significantly reduced. ### Mathematical formula representation Some of the key formulas mentioned in the paper include: - **Definition of expected fixed point**: \[ \mathbb{E}_{x \sim \pi}[\phi(x) - x] \approx 0 \] where $\pi$ is a probability distribution and $\phi$ is a deviation function. - **Complexity analysis**: \[ N^{O(kd^3)} / \epsilon^2 \] where $N$ is the dimension of the strategy space, $k$ is the degree of the polynomial, $d$ is the depth of the game tree, and $\epsilon$ is the precision parameter. Through these innovations, the paper provides a new and more efficient solution for regret minimization in extensive - form games.

Efficient $Φ$-Regret Minimization with Low-Degree Swap Deviations in Extensive-Form Games

A Lower Bound on Swap Regret in Extensive-Form Games

Fast swap regret minimization and applications to approximate correlated equilibria

From External to Swap Regret 2.0: An Efficient Reduction and Oblivious Adversary for Large Action Spaces

Near-Optimal $Φ$-Regret Learning in Extensive-Form Games

Evolutionary Dynamics and $Φ$-Regret Minimization in Games

Forecasting for Swap Regret for All Downstream Agents

Online Convex Optimization for Sequential Decision Processes and Extensive-Form Games

Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent

RM-FSP: Regret Minimization Optimizes Neural Fictitious Self-Play

Regret-Minimizing Double Oracle for Extensive-Form Games

Integrating Dynamic Weighted Approach with Fictitious Play and Pure Counterfactual Regret Minimization for Equilibrium Finding

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games: Corrections

Lazy-CFR: a Fast Regret Minimization Algorithm for Extensive Games with Imperfect Information.

Regret-Optimal Federated Transfer Learning for Kernel Regression with Applications in American Option Pricing

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games

On the Computational Efficiency of Adaptive and Dynamic Regret Minimization

Refined Regret for Adversarial MDPs with Linear Function Approximation

Imization for extensive games with imperfect information

Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent

Regret Minimization via Saddle Point Optimization