Efficient $Φ$-Regret Minimization with Low-Degree Swap Deviations in Extensive-Form Games

Brian Hu Zhang,Ioannis Anagnostides,Gabriele Farina,Tuomas Sandholm
2024-02-17
Abstract:Recent breakthrough results by Dagan, Daskalakis, Fishelson and Golowich [2023] and Peng and Rubinstein [2023] established an efficient algorithm attaining at most $\epsilon$ swap regret over extensive-form strategy spaces of dimension $N$ in $N^{\tilde O(1/\epsilon)}$ rounds. On the other extreme, Farina and Pipis [2023] developed an efficient algorithm for minimizing the weaker notion of linear-swap regret in $\mathsf{poly}(N)/\epsilon^2$ rounds. In this paper, we take a step toward bridging the gap between those two results. We introduce the set of $k$-mediator deviations, which generalize the untimed communication deviations recently introduced by Zhang, Farina and Sandholm [2024] to the case of having multiple mediators. We develop parameterized algorithms for minimizing the regret with respect to this set of deviations in $N^{O(k)}/\epsilon^2$ rounds. This closes the gap in the sense that $k=1$ recovers linear swap regret, while $k=N$ recovers swap regret. Moreover, by relating $k$-mediator deviations to low-degree polynomials, we show that regret minimization against degree-$k$ polynomial swap deviations is achievable in $N^{O(kd)^3}/\epsilon^2$ rounds, where $d$ is the depth of the game, assuming constant branching factor. For a fixed degree $k$, this is polynomial for Bayesian games and quasipolynomial more broadly when $d = \mathsf{polylog} N$ -- the usual balancedness assumption on the game tree.
Computer Science and Game Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively minimize the Φ - regret related to low - degree swap deviations in extensive - form games. Specifically, the paper focuses on finding a method that can calculate approximate correlated equilibria within a reasonable time in games with a large number of pure strategies. In previous research, there have been some algorithms that can handle linear - swap regret, but for more general swap regret, especially the regret involving low - degree polynomials, there is still a lack of effective solutions. ### Main contributions of the paper 1. **Introduction of k - mediator deviation**: This is a new type of deviation, which generalizes the untimed communication deviations recently proposed by Zhang, Farina and Sandholm, and is applicable to the situation of multiple mediators. 2. **Development of parameterized algorithms**: These algorithms can minimize the regret regarding k - mediator deviation within \( N^{O(k)} / \epsilon^2 \) rounds. When \( k = 1 \), this is equivalent to linear - swap regret; when \( k = N \), this is equivalent to full swap regret. 3. **Establishment of the connection between low - degree polynomial deviation and low - depth decision trees**: Through this connection, it is proved that under certain conditions, minimizing low - degree polynomial swap regret is feasible, and specific complexity analysis is provided. 4. **Proposal of the concept of expected fixed point**: This is a relaxation of the traditional fixed - point concept, which significantly reduces the computational complexity. The author shows how to calculate the fixed point in the expected sense, thereby avoiding the need to solve linear systems and improving computational efficiency. ### Specific problem description In extensive - form games, the action strategy space of players is very large, making it extremely difficult to directly calculate the correlated equilibrium. Traditional fixed - point methods are often PPAD - hard (i.e., the computational complexity is very high) in this case. Therefore, the paper proposes a new method to bypass this problem. By introducing the expected fixed point and k - mediator deviation, the computational complexity is significantly reduced. ### Mathematical formula representation Some of the key formulas mentioned in the paper include: - **Definition of expected fixed point**: \[ \mathbb{E}_{x \sim \pi}[\phi(x) - x] \approx 0 \] where \(\pi\) is a probability distribution and \(\phi\) is a deviation function. - **Complexity analysis**: \[ N^{O(kd^3)} / \epsilon^2 \] where \(N\) is the dimension of the strategy space, \(k\) is the degree of the polynomial, \(d\) is the depth of the game tree, and \(\epsilon\) is the precision parameter. Through these innovations, the paper provides a new and more efficient solution for regret minimization in extensive - form games.