Abstract:We consider a repeated Stackelberg game setup where the leader faces a sequence of followers of unknown types and must learn what commitments to make. While previous works have considered followers that best respond to the commitment announced by the leader in every round, we relax this setup in two ways. Motivated by natural scenarios where the leader's reputation factors into how the followers choose their response, we consider followers with memory. Specifically, we model followers that base their response on not just the leader's current commitment but on an aggregate of their past commitments. In developing learning strategies that the leader can employ against such followers, we make the second relaxation and assume boundedly rational followers. In particular, we focus on followers employing quantal responses. Interestingly, we observe that the smoothness property offered by the quantal response (QR) model helps in addressing the challenge posed by learning against followers with memory. Utilizing techniques from online learning, we develop algorithms that guarantee $O(\sqrt{T})$ regret for quantal responding memory-less followers and $O(\sqrt{BT})$ regret for followers with bounded memory of length $B$ with both scaling polynomially in game parameters.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how the leader learns its optimal strategy in repeated Stackelberg games to deal with a series of followers of unknown types. Specifically, this paper mainly focuses on the following two aspects: 1. **Followers with memory**: Different from the assumption in previous studies that followers only make the best response according to the current commitment of the leader, this paper considers the memory effect of followers based on the past behavior of the leader. The response of followers depends not only on the current strategy of the leader but also on the past strategy of the leader. 2. **Bounded - rational followers**: The article further relaxes the assumption that followers always make the best response and considers bounded - rational followers using the quantal response (QR) model. This model allows followers to show a certain degree of irrationality or randomness in decision - making. ### Specific problem description In this context, the paper proposes two core problems: - **Problem 1**: When facing a series of memory - less and unknown - type followers, how does the leader learn its optimal strategy? - **Problem 2**: When facing followers with memory and unknown types, how does the leader learn its optimal strategy? ### Solutions To deal with these problems, the author has developed two algorithms: 1. **Algorithm for memory - less followers**: This algorithm ensures that when facing memory - less followers with quantal response, the leader's regret value is $O(\sqrt{T})$, where $T$ is the number of rounds of the game. 2. **Algorithm for followers with memory**: This algorithm ensures that when facing quantal - response followers with a finite memory length $B$, the leader's regret value is $O(\sqrt{BT})$. Through these algorithms, the author shows how to use online learning techniques to design strategies so that the leader can achieve near - optimal performance in long - term games, even when facing followers with memory and bounded rationality. ### Mathematical formulas The key formulas involved in the article include: - Definition of the leader's regret value: \[ \text{Regret}(H)=\max_{x\in\Delta_N}\left\langle Y(x)^T U^T x, G_H\right\rangle-\sum_{t = 1}^H\left\langle Y(x_t)^T U^T x_t, g_t\right\rangle \] where $G_H=\sum_{t = 1}^H g_t$. - Definition of the regret value for followers with memory: \[ \text{Regret}_M(H)=\max_{x\in\Delta_N}\left\langle Y(x)^T U^T x, G_H\right\rangle-\sum_{t = 1}^H\left\langle Y(z_t)^T U^T x_t, g_t\right\rangle \] where $z_t=\frac{1}{b_t}\sum_{\tau = 1}^t a_{t-\tau}x_\tau$ is the time - averaged leader strategy. These formulas are used to measure the learning performance of the leader when facing different types of followers and provide a theoretical basis for designing effective learning algorithms.

Responding to Promises: No-regret learning against followers with memory

ReLExS: Reinforcement Learning Explanations for Stackelberg No-Regret Learners

Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks

Regret Minimization in Stackelberg Games with Side Information

No-Regret Learning for Stackelberg Equilibrium Computation in Newsvendor Pricing Games

Is Learning in Games Good for the Learners?

Learning to Manipulate a Commitment Optimizer

Satisfaction and Regret in Stackelberg Games

Watch and Learn: Optimizing from Revealed Preferences Feedback

Robust No-Regret Learning in Min-Max Stackelberg Games

Decentralized Online Learning in General-Sum Stackelberg Games

Learning in Markov Games with Adaptive Adversaries: Policy Regret, Fundamental Barriers, and Efficient Algorithms

Follower Agnostic Methods for Stackelberg Games

Efficient Stackelberg Strategies for Finitely Repeated Games

Learning How to Strategically Disclose Information

Imitative Follower Deception in Stackelberg Games

No-Regret Learning in Extensive-Form Games with Imperfect Recall

No-Regret Learnability for Piecewise Linear Losses

Learning in Stackelberg Games with Non-myopic Agents

Simple Opinion Dynamics for No-Regret Learning

Solving Strongly Convex and Smooth Stackelberg Games Without Modeling the Follower