Abstract:Agents in mixed-motive coordination problems such as Chicken may fail to coordinate on a Pareto-efficient outcome. Safe Pareto improvements (SPIs) were originally proposed to mitigate miscoordination in cases where players lack probabilistic beliefs as to how their delegates will play a game; delegates are instructed to behave so as to guarantee a Pareto improvement on how they would play by default. More generally, SPIs may be defined as transformations of strategy profiles such that all players are necessarily better off under the transformed profile. In this work, we investigate the extent to which SPIs can reduce downsides of miscoordination between expected utility-maximizing agents. We consider games in which players submit computer programs that can condition their decisions on each other's code, and use this property to construct SPIs using programs capable of renegotiation. We first show that under mild conditions on players' beliefs, each player always prefers to use renegotiation. Next, we show that under similar assumptions, each player always prefers to be willing to renegotiate at least to the point at which they receive the lowest payoff they can attain in any efficient outcome. Thus subjectively optimal play guarantees players at least these payoffs, without the need for coordination on specific Pareto improvements. Lastly, we prove that renegotiation does not guarantee players any improvements on this bound.

What problem does this paper attempt to address?

This paper attempts to solve the problem that agents cannot coordinate to achieve Pareto - efficient results in mixed - motive coordination problems (such as the "Chicken Game"). Specifically, the paper explores how to reduce the negative effects caused by coordination failures through Safe Pareto Improvements (SPIs). In particular, the author studies to what extent SPIs can alleviate these negative effects when players submit computer programs that can adjust their decisions according to the other party's code. ### Core Problems of the Paper 1. **Coordination Failure Problem**: In some games, agents may choose different Pareto - efficient equilibrium strategies, resulting in the collective strategy not being in an equilibrium state and thus producing inefficient results. 2. **Subjectively Optimal Strategy Problem**: Even if players are capable of conditional commitment (such as in program games), they may still be trapped in sub - optimal solutions due to different beliefs about the other party's behavior. 3. **SPI Selection Problem**: How to coordinate among multiple possible SPIs to ensure that all players can obtain Pareto improvements from the default results. ### Main Contributions 1. **Constructing SPIs**: The author proposes a method of constructing SPIs using programs that can be renegotiated. These programs first check whether the default program will lead to inefficient results, and if so, they attempt to achieve Pareto improvements through renegotiation. 2. **Guaranteeing Minimum Payments**: Under certain conditions, the author proves that each player always prefers programs that can at least guarantee them the minimum payment in any efficient result. This payment is called Pareto Meet Minimum (PMM). 3. **Theoretical Results**: The author shows that under subjective equilibrium, renegotiation does not always guarantee improvements beyond PMM. This is because PMM is the most efficient point where all players still have the opportunity to renegotiate to the desired result. ### Key Concepts and Formulas - **Pareto Improvement**: If an outcome makes all participants no worse off than before and at least some participants better off, then this outcome is called a Pareto improvement. \[ x \succ y \text{ if } x_i \geq y_i \text{ for all } i \text{ and } x_j > y_j \text{ for some } j \] - **Pareto Meet Minimum (PMM)**: The minimum payment that each player can obtain in any efficient result. \[ u_{\text{PMM}} = (\min_{a \in E} u_i(a))_{i = 1}^n \] - **Subjective Equilibrium**: Given the belief distributions of other players, each player's strategy maximizes their expected utility. \[ p_i^* \in \arg\max_{p_i \in P_i} \mathbb{E}_{p_{-i} \sim \beta_i}[U_i(p)] \] ### Summary The main objective of the paper is to explore how to reduce inefficient results caused by coordination failures by introducing a mechanism that can be renegotiated in program games. Through strict mathematical analysis, the author proves that under certain conditions, renegotiation can guarantee that each player obtains at least the Pareto Meet Minimum (PMM), and shows the importance of this result in practical applications.

Safe Pareto Improvements for Expected Utility Maximizers in Program Games

Social Optimum Equilibrium Selection for Distributed Multi-Agent Optimization

Reducing Optimism Bias in Incomplete Cooperative Games

An Objective Improvement Approach to Solving Discounted Payoff Games

Policy Iteration for Pareto-Optimal Policies in Stochastic Stackelberg Games

Using a Stochastic Agent Model to Optimize Performance in Divergent Interest Tacit Coordination Games

Safe Equilibrium

Non-oblivious Strategy Improvement

Smoothing Policy Iteration for Zero-sum Markov Games

Safe Opponent Exploitation For Epsilon Equilibrium Strategies

Optimize Neural Fictitious Self-Play in Regret Minimization Thinking

Opportunistic Qualitative Planning in Stochastic Systems with Incomplete Preferences over Reachability Objectives

Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies

Expectation in Stochastic Games with Prefix-independent Objectives

Cooperation and self-regulation in a model of agents playing different games

How much is convenient to defect? A method to estimate the cooperation probability in Prisoner's Dilemma and other games

Strategic Play By Resource-Bounded Agents in Security Games

Dynamics of Profit-Sharing Games

A Sharp Analysis of Model-based Reinforcement Learning with Self-Play

Learning in Multi-Objective Public Goods Games with Non-Linear Utilities

A Risk-Averse Equilibrium for Multi-Agent Systems