Abstract:In multi-agent problems requiring a high degree of cooperation, success often depends on the ability of the agents to adapt to each other's behavior. A natural solution concept in such settings is the Stackelberg equilibrium, in which the ``leader'' agent selects the strategy that maximizes its own payoff given that the ``follower'' agent will choose their best response to this strategy. Recent work has extended this solution concept to two-player differentiable games, such as those arising from multi-agent deep reinforcement learning, in the form of the \textit{differential} Stackelberg equilibrium. While this previous work has presented learning dynamics which converge to such equilibria, these dynamics are ``coupled'' in the sense that the learning updates for the leader's strategy require some information about the follower's payoff function. As such, these methods cannot be applied to truly decentralised multi-agent settings, particularly ad hoc cooperation, where each agent only has access to its own payoff function. In this work we present ``uncoupled'' learning dynamics based on zeroth-order gradient estimators, in which each agent's strategy update depends only on their observations of the other's behavior. We analyze the convergence of these dynamics in general-sum games, and prove that they converge to differential Stackelberg equilibria under the same conditions as previous coupled methods. Furthermore, we present an online mechanism by which symmetric learners can negotiate leader-follower roles. We conclude with a discussion of the implications of our work for multi-agent reinforcement learning and ad hoc collaboration more generally.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to find the Differential Stackelberg Equilibrium (DSE) in multi - agent systems without knowing the payoff functions of other agents. Specifically, the researchers proposed a new decoupled learning method - Hierarchical learning with Commitments (Hi - C) to deal with completely independent multi - agent learning scenarios, such as ad hoc teamwork. ### Problem Background In multi - agent environments, especially in situations requiring high - level cooperation, the success of agents often depends on their ability to adapt to each other's behaviors. The Stackelberg equilibrium is a natural solution concept in such situations, where the "leader" selects a strategy, under which the "follower" will choose its best - response strategy. However, existing methods for finding DSE are usually "coupled", that is, the leader's strategy update requires some information about the follower's payoff function. This limits the application of these methods in truly decentralized multi - agent settings, especially in ad hoc cooperation, where each agent can only access its own payoff function. ### Core Contributions of the Paper 1. **Decoupled Learning Dynamics**: The paper proposed a "decoupled" learning dynamic based on a zero - order gradient estimator, making each agent's strategy update depend only on the observation of other agents' behaviors. 2. **Convergence Analysis**: The authors analyzed the convergence of these dynamics in general - sum games and proved that they converge to DSE under the same conditions as previous coupled methods. 3. **Online Role Negotiation Mechanism**: The paper also introduced a mechanism by which symmetric learners can negotiate leader - follower roles online, allowing agents to negotiate their respective roles while solving the underlying differential game. 4. **Practical Applications**: This method provides new solutions for multi - agent reinforcement learning and ad hoc collaboration, especially for cases where agents cannot share internal information or payoff functions. ### Mathematical Formula Representation - Definition of Differential Stackelberg Equilibrium (DSE): \[ \begin{aligned} &\text{Condition (I):} \quad \nabla_x [f_1(x^*, r(x^*))]=0 \quad \text{and} \quad \nabla_y [f_2(x^*, y^*)] = 0, \\ &\text{Condition (II):} \quad \nabla_{xx} [f_1(x^*, r(x^*))] \quad \text{and} \quad \nabla_{yy} [f_2(x^*, y^*)] \quad \text{are both negative definite}. \end{aligned} \] - Update rule in the Hi - C algorithm: \[ x_i^{n + 1}=x_i^n+\alpha_n\frac{f_1(\tilde{x}_n,\tilde{y}_n)+w_n}{\delta_n\Delta_i^n} \] where \(\tilde{x}_n = x_n+\delta_n\Delta_n\) is the perturbed strategy, and \(\tilde{y}_n\) is the final strategy of the follower within the interval \(n\), which is used as an estimate of \(r(\tilde{x}_n)\). Through this method, the Hi - C algorithm can effectively find DSE without relying on the follower's payoff function, thus solving the limitations of existing methods in scenarios such as ad hoc cooperation.

Uncoupled Learning of Differential Stackelberg Equilibria with Commitments

Convergence of Learning Dynamics in Stackelberg Games

Decentralized Online Learning in General-Sum Stackelberg Games

Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning

On decentralized computation of the leader's strategy in bi-level games

Impact of Decentralized Learning on Player Utilities in Stackelberg Games

No-Regret Learning for Stackelberg Equilibrium Computation in Newsvendor Pricing Games

Stackelberg Meta-Learning Based Control for Guided Cooperative LQG Systems

Follower Agnostic Methods for Stackelberg Games

Equilibria of Fully Decentralized Learning in Networked Systems

Decentralized and Uncoordinated Learning of Stable Matchings: A Game-Theoretic Approach

Inducing Stackelberg Equilibrium through Spatio-Temporal Sequential Decision-Making in Multi-Agent Reinforcement Learning

Higher-Order Uncoupled Dynamics Do Not Lead to Nash Equilibrium -- Except When They Do

Robust No-Regret Learning in Min-Max Stackelberg Games

Efficient Stackelberg Strategies for Finitely Repeated Games

Solving Strongly Convex and Smooth Stackelberg Games Without Modeling the Follower

An overlapping information linear-quadratic Stackelberg stochastic differential game with two leaders and two followers

A Three-level Stochastic Linear-quadratic Stackelberg Differential Game with Asymmetric Information

Calibrated Stackelberg Games: Learning Optimal Commitments Against Calibrated Agents

Stackelberg POMDP: A Reinforcement Learning Approach for Economic Design

The Danger Of Arrogance: Welfare Equilibra As A Solution To Stackelberg Self-Play In Non-Coincidental Games