Repeated Contracting with Multiple Non-Myopic Agents: Policy Regret and Limited Liability
Natalie Collina,Varun Gupta,Aaron Roth
2024-02-27
Abstract:We study a repeated contracting setting in which a Principal adaptively
chooses amongst $k$ Agents at each of $T$ rounds. The Agents are non-myopic,
and so a mechanism for the Principal induces a $T$-round extensive form game
amongst the Agents. We give several results aimed at understanding an
under-explored aspect of contract theory -- the game induced when choosing an
Agent to contract with. First, we show that this game admits a pure-strategy
\emph{non-responsive} equilibrium amongst the Agents -- informally an
equilibrium in which the Agent's actions depend on the history of realized
states of nature, but not on the history of each other's actions, and so avoids
the complexities of collusion and threats. Next, we show that if the Principal
selects Agents using a \emph{monotone} bandit algorithm, then for any concave
contract, in any such equilibrium, the Principal obtains no regret to
contracting with the best Agent in hindsight -- not just given their realized
actions, but also to the counterfactual world in which they had offered a
guaranteed $T$-round contract to the best Agent in hindsight, which would have
induced a different sequence of actions. Finally, we show that if the Principal
selects Agents using a monotone bandit algorithm which guarantees no
swap-regret, then the Principal can additionally offer only limited liability
contracts (in which the Agent never needs to pay the Principal) while getting
no-regret to the counterfactual world in which she offered a linear contract to
the best Agent in hindsight -- despite the fact that linear contracts are not
limited liability. We instantiate this theorem by demonstrating the existence
of a monotone no swap-regret bandit algorithm, which to our knowledge has not
previously appeared in the literature.
Machine Learning,Computer Science and Game Theory,Data Structures and Algorithms