Satisfaction and Regret in Stackelberg Games

Langford White,Duong Nguyen,Hung Nguyen
2024-08-21
Abstract:This paper introduces the new concept of (follower) satisfaction in Stackelberg games and compares the standard Stackelberg game with its satisfaction version. Simulation results are presented which suggest that the follower adopting satisfaction generally increases leader utility. This important new result is proven for the case where leader strategies to commit to are restricted to be deterministic (pure strategies). The paper then addresses the application of regret based algorithms to the Stackelberg problem. Although it is known that the follower adopts a no-regret position in a Stackelberg solution, this is not generally the case for the leader. The report examines the convergence behaviour of unconditional and conditional regret matching (RM) algorithms in the Stackelberg setting. The paper shows that, in the examples considered, that these algorithms either converge to any pure Nash equilibria for the simultaneous move game, or to some mixed strategies which do not have the "no-regret" property. In one case, convergence of the conditional RM algorithm over both players to a solution "close" to the Stackelberg case was observed. The paper argues that further research in this area, in particular when applied in the satisfaction setting could be fruitful.
Computer Science and Game Theory
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to introduce and explore the new concept of "follower satisfaction" in the Stackelberg game and compare it with the standard Stackelberg game. Specifically, the paper focuses on the following aspects: 1. **Introducing the satisfaction model**: The paper proposes a new model in which followers pursue satisfaction rather than maximizing utility in the Stackelberg game. Satisfaction means that followers can be considered satisfied when they reach a certain minimum utility level. 2. **Comparing the leader's utility under different models**: Through theoretical proof and simulation experiments, the paper shows that when followers use satisfaction as their reward setting, the leader's utility is usually higher than that in the standard Stackelberg game. Especially when the leader's strategy is restricted to pure strategies, this has been strictly mathematically proven. 3. **Application of the regret - matching algorithm**: The paper also studies the application of the regret - matching algorithm in the Stackelberg game. Although it is known that followers adopt no - regret strategies in the Stackelberg solution, the leader is not always so. The paper examines the convergence behavior of unconditional and conditional regret - matching algorithms in the Stackelberg setting and shows that these algorithms can converge to pure Nash equilibria or some mixed strategies in some cases. ### Key formulas - **Follower's best response function**: \[ B(\pi_\ell)=\arg\max_{t\in S_f}P_{\pi_\ell}[U_f(.,t)\geq U_f^-] \] where \(U_f^-\) is the follower's minimum utility threshold. - **Leader's optimal mixed strategy**: \[ \pi_\ell^*=\arg\max_{\pi\in\Delta(S_\ell)}E_\pi[\max_{t\in B(\pi)}U_\ell(.,t)] \] - **Linear programming constraints**: \[ \forall t\in S_f,\sum_{s\in S_\ell}\pi(s)(I[U_f(s,t)\geq U_f^-]-I[U_f(s,s_f)\geq U_f^-])\leq0 \] ### Main conclusions - When followers use satisfaction as their reward setting, the leader's utility is usually higher than that in the standard Stackelberg game. - Under the pure - strategy restriction, this conclusion has been strictly mathematically proven. - The regret - matching algorithm can converge to pure Nash equilibria or some mixed strategies in some cases, but not always. These research results are of great significance for understanding strategic decision - making, equilibrium analysis, and policy design, and have potential application value in practical applications (such as pricing strategies, network security, traffic management, and healthcare, etc.).