Abstract:We study the game modification problem, where a benevolent game designer or a malevolent adversary modifies the reward function of a zero-sum Markov game so that a target deterministic or stochastic policy profile becomes the unique Markov perfect Nash equilibrium and has a value within a target range, in a way that minimizes the modification cost. We characterize the set of policy profiles that can be installed as the unique equilibrium of a game and establish sufficient and necessary conditions for successful installation. We propose an efficient algorithm that solves a convex optimization problem with linear constraints and then performs random perturbation to obtain a modification plan with a near-optimal cost.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to minimize the cost of modifying zero - sum Markov games in order to achieve any Nash equilibrium and a specific value range. Specifically, the researchers are concerned with how a third party (which can be a beneficial designer or a malicious adversary) can modify the reward function in a zero - sum Markov game so that a certain target policy configuration becomes the unique Markov perfect Nash equilibrium and its value is within a specific range. At the same time, this modification needs to minimize the modification cost to the original game.
### Problem Background
Consider a two - player zero - sum Markov game \(G\), where:
- \(S\) is the finite state space.
- \(A_i\) is the finite action set of player \(i\).
- \(H\) is the time horizon.
It is known that such a game has at least one Markov perfect Nash equilibrium (MPE), and all MPE game values \(v^*\) are the same, corresponding to player 1's expected gain and player 2's loss. For some applications, it may be desirable to change this equilibrium strategy or game value. For example, a benevolent third party may wish to achieve fairness (i.e., \(v = 0\)), or make the equilibrium strategy more intuitive and easier for bounded - rational players to find and execute.
### Research Objectives
The goals of the paper are to answer the following questions:
1. **When can game modification be effectively carried out**: that is, under what conditions can the specified Nash equilibrium and game value be achieved by modifying the reward function.
2. **How to develop an effective algorithm**: used to find an approximately optimal modification scheme so that the modified game satisfies the unique equilibrium condition and the given value range while minimizing the modification cost.
### Main Contributions
1. **Sufficient and Necessary Conditions**: Provide sufficient and necessary conditions for the feasibility of the game modification problem, ensuring that the modified game has a unique Markov perfect Nash equilibrium.
2. **Efficient Algorithm**: Propose an efficient "Relax and Perturb" (RAP) algorithm, which can be proven to find an approximately optimal solution under convex loss functions.
3. **Theoretical Analysis**: By introducing the SIISOW and INV conditions, transform the game modification problem into an optimization problem with linear and spectral constraints and fully characterize its feasibility.
### Mathematical Expression
The game modification problem can be formalized as the following optimization problem:
\[
\begin{aligned}
& \inf_{R} \ell(R, R^*) \\
& \text{s.t. } (p, q) \text{ is the unique MPE of } (R, P^*) \\
& \text{value}(R, P^*) \in [v_{\min}, v_{\max}] \\
& \text{Elements of } R \text{ are within } [-b, b]
\end{aligned}
\]
where:
- \(R^*\) and \(P^*\) are the reward matrix and transition probability matrix of the original game respectively.
- \((p, q)\) is the target policy configuration.
- \(\ell(R, R^*)\) is a loss function that measures the difference between the new and old games, for example, \(\ell(R, R^*)=\|R - R^*\|\).
Through the above optimization problem, the researchers have solved the problem of how to minimize the modification cost while achieving a specific Nash equilibrium and game value.