Minimally Modifying a Markov Game to Achieve Any Nash Equilibrium and Value

Young Wu,Jeremy McMahan,Yiding Chen,Yudong Chen,Xiaojin Zhu,Qiaomin Xie
2024-08-20
Abstract:We study the game modification problem, where a benevolent game designer or a malevolent adversary modifies the reward function of a zero-sum Markov game so that a target deterministic or stochastic policy profile becomes the unique Markov perfect Nash equilibrium and has a value within a target range, in a way that minimizes the modification cost. We characterize the set of policy profiles that can be installed as the unique equilibrium of a game and establish sufficient and necessary conditions for successful installation. We propose an efficient algorithm that solves a convex optimization problem with linear constraints and then performs random perturbation to obtain a modification plan with a near-optimal cost.
Computer Science and Game Theory,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to minimize the cost of modifying zero - sum Markov games in order to achieve any Nash equilibrium and a specific value range. Specifically, the researchers are concerned with how a third party (which can be a beneficial designer or a malicious adversary) can modify the reward function in a zero - sum Markov game so that a certain target policy configuration becomes the unique Markov perfect Nash equilibrium and its value is within a specific range. At the same time, this modification needs to minimize the modification cost to the original game. ### Problem Background Consider a two - player zero - sum Markov game \(G\), where: - \(S\) is the finite state space. - \(A_i\) is the finite action set of player \(i\). - \(H\) is the time horizon. It is known that such a game has at least one Markov perfect Nash equilibrium (MPE), and all MPE game values \(v^*\) are the same, corresponding to player 1's expected gain and player 2's loss. For some applications, it may be desirable to change this equilibrium strategy or game value. For example, a benevolent third party may wish to achieve fairness (i.e., \(v = 0\)), or make the equilibrium strategy more intuitive and easier for bounded - rational players to find and execute. ### Research Objectives The goals of the paper are to answer the following questions: 1. **When can game modification be effectively carried out**: that is, under what conditions can the specified Nash equilibrium and game value be achieved by modifying the reward function. 2. **How to develop an effective algorithm**: used to find an approximately optimal modification scheme so that the modified game satisfies the unique equilibrium condition and the given value range while minimizing the modification cost. ### Main Contributions 1. **Sufficient and Necessary Conditions**: Provide sufficient and necessary conditions for the feasibility of the game modification problem, ensuring that the modified game has a unique Markov perfect Nash equilibrium. 2. **Efficient Algorithm**: Propose an efficient "Relax and Perturb" (RAP) algorithm, which can be proven to find an approximately optimal solution under convex loss functions. 3. **Theoretical Analysis**: By introducing the SIISOW and INV conditions, transform the game modification problem into an optimization problem with linear and spectral constraints and fully characterize its feasibility. ### Mathematical Expression The game modification problem can be formalized as the following optimization problem: \[ \begin{aligned} & \inf_{R} \ell(R, R^*) \\ & \text{s.t. } (p, q) \text{ is the unique MPE of } (R, P^*) \\ & \text{value}(R, P^*) \in [v_{\min}, v_{\max}] \\ & \text{Elements of } R \text{ are within } [-b, b] \end{aligned} \] where: - \(R^*\) and \(P^*\) are the reward matrix and transition probability matrix of the original game respectively. - \((p, q)\) is the target policy configuration. - \(\ell(R, R^*)\) is a loss function that measures the difference between the new and old games, for example, \(\ell(R, R^*)=\|R - R^*\|\). Through the above optimization problem, the researchers have solved the problem of how to minimize the modification cost while achieving a specific Nash equilibrium and game value.