Abstract:With the constraint of a no regret follower, will the players in a two-player Stackelberg game still reach Stackelberg equilibrium? We first show when the follower strategy is either reward-average or transform-reward-average, the two players can always get the Stackelberg Equilibrium. Then, we extend that the players can achieve the Stackelberg equilibrium in the two-player game under the no regret constraint. Also, we show a strict upper bound of the follower's utility difference between with and without no regret constraint. Moreover, in constant-sum two-player Stackelberg games with non-regret action sequences, we ensure the total optimal utility of the game remains also bounded.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in the case of a follower with "no regret" constraint, can the players in a two - person Stackelberg game still reach the Stackelberg equilibrium? Specifically, the paper explores the following issues: 1. **Existence of Stackelberg equilibrium**: When the follower's strategy is reward - average or transform - reward - average, can the two players always reach the Stackelberg equilibrium? 2. **Influence of no - regret constraint**: Under the no - regret constraint, can players always reach the Stackelberg equilibrium in a two - person game? In addition, the paper also establishes a strict upper bound on the utility difference of the follower with and without the no - regret constraint. 3. **Optimal utility in constant - sum games**: In a constant - sum two - person Stackelberg game, if the follower's action sequence is no - regret, is the total optimal utility of the game still bounded? ### Main contributions - **Theoretical proof**: The paper proves that under certain loose conditions, the two players can always reach the Stackelberg equilibrium (see Theorem 9). Further, it shows that under the no - regret constraint, players can consistently achieve the Stackelberg equilibrium. - **Upper bound of utility difference**: The paper establishes a strict upper bound that describes the utility difference of the follower with and without the no - regret constraint. - **Utility guarantee in constant - sum games**: In a constant - sum two - person Stackelberg game, if the follower's action sequence is no - regret, then the total optimal utility of the game remains bounded. ### Experimental verification Through theoretical analysis and experimental verification, the paper shows that in the multi - agent reinforcement learning framework, the leader can use the reinforcement learning algorithm, and the follower can use the no - regret algorithm, so that the entire system reaches the Stackelberg equilibrium. The experimental results show that in various matrix game environments, the strategies under the no - regret constraint can be close to or even reach the Stackelberg equilibrium. ### Formula representation To ensure the correctness and readability of the formulas, the following are the key formulas involved in the paper: - **Definition of regret value**: \[ \text{Reg}_T(\vec{a}_F)=\max_{\vec{a}_F}\mathbb{E}_{d_F(s_0)}\left[\sum_{t = 0}^T R_t^F(s_t^F,a_t^F,\bar{s}_t^L,\bar{a}_t^L)\mid a_t^F\sim\pi_F(a\mid s_t^F,a_t^L),s_0^F\sim d_F(s_0),\bar{s}_t^L,\bar{a}_t^L\right]-\sum_{t = 0}^T\bar{R}_t^F \] - **Best reward operator**: \[ \mu^*_{\vec{a}_F}R_T^F=\max_{\vec{a}_F}\mathbb{E}_{d_F(s_0)}\left[\sum_{t = 0}^T R_t^F(s_t^F,a_t^F,\bar{s}_t^L,\bar{a}_t^L)\mid a_t^F\sim\pi_F(a\mid s_t^F,a_t^L),s_0^F\sim d_F(s_0),\bar{s}_t^L,\bar{a}_t^L\right] \] - **Definition of no - regret property**: \[ \mathbb{E}\left[\mu^*_{\vec{a}_F}R_T^F-\sum_{t = 0}^T\bar{R}_t^F\right]=o(T) \] These formulas help to understand the no - regret property of the follower and its influence on the Stackelberg equilibrium.

ReLExS: Reinforcement Learning Explanations for Stackelberg No-Regret Learners

Satisfaction and Regret in Stackelberg Games

Regret Minimization in Stackelberg Games with Side Information

Responding to Promises: No-regret learning against followers with memory

Is Learning in Games Good for the Learners?

No-Regret Learning for Stackelberg Equilibrium Computation in Newsvendor Pricing Games

Robust No-Regret Learning in Min-Max Stackelberg Games

Watch and Learn: Optimizing from Revealed Preferences Feedback

Follower Agnostic Methods for Stackelberg Games

Solving Strongly Convex and Smooth Stackelberg Games Without Modeling the Follower

Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks

RM-FSP: Regret Minimization Optimizes Neural Fictitious Self-Play

Efficient Stackelberg Strategies for Finitely Repeated Games

Inverse Game Theory for Stackelberg Games: the Blessing of Bounded Rationality

Decentralized Online Learning in General-Sum Stackelberg Games

Exploiting a No-Regret Opponent in Repeated Zero-Sum Games

Distributed Stackelberg Equilibrium Seeking for Networked Multi-Leader Multi-Follower Games with A Clustered Information Structure

Stackelberg vs. Nash in the Lottery Colonel Blotto Game

Imitative Follower Deception in Stackelberg Games

A Linear-quadratic Mean-Field Stochastic Stackelberg Differential Game with Random Exit Time

The Application of Non-Cooperative Stackelberg Game Theory in Behavioral Science: Social Optimality with any Number of Players