Abstract:We study the game modification problem, where a benevolent game designer or a malevolent adversary modifies the reward function of a zero-sum Markov game so that a target deterministic or stochastic policy profile becomes the unique Markov perfect Nash equilibrium and has a value within a target range, in a way that minimizes the modification cost. We characterize the set of policy profiles that can be installed as the unique equilibrium of a game and establish sufficient and necessary conditions for successful installation. We propose an efficient algorithm that solves a convex optimization problem with linear constraints and then performs random perturbation to obtain a modification plan with a near-optimal cost.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to minimize the cost of modifying zero - sum Markov games in order to achieve any Nash equilibrium and a specific value range. Specifically, the researchers are concerned with how a third party (which can be a beneficial designer or a malicious adversary) can modify the reward function in a zero - sum Markov game so that a certain target policy configuration becomes the unique Markov perfect Nash equilibrium and its value is within a specific range. At the same time, this modification needs to minimize the modification cost to the original game. ### Problem Background Consider a two - player zero - sum Markov game \(G\), where: - \(S\) is the finite state space. - \(A_i\) is the finite action set of player \(i\). - \(H\) is the time horizon. It is known that such a game has at least one Markov perfect Nash equilibrium (MPE), and all MPE game values \(v^*\) are the same, corresponding to player 1's expected gain and player 2's loss. For some applications, it may be desirable to change this equilibrium strategy or game value. For example, a benevolent third party may wish to achieve fairness (i.e., \(v = 0\)), or make the equilibrium strategy more intuitive and easier for bounded - rational players to find and execute. ### Research Objectives The goals of the paper are to answer the following questions: 1. **When can game modification be effectively carried out**: that is, under what conditions can the specified Nash equilibrium and game value be achieved by modifying the reward function. 2. **How to develop an effective algorithm**: used to find an approximately optimal modification scheme so that the modified game satisfies the unique equilibrium condition and the given value range while minimizing the modification cost. ### Main Contributions 1. **Sufficient and Necessary Conditions**: Provide sufficient and necessary conditions for the feasibility of the game modification problem, ensuring that the modified game has a unique Markov perfect Nash equilibrium. 2. **Efficient Algorithm**: Propose an efficient "Relax and Perturb" (RAP) algorithm, which can be proven to find an approximately optimal solution under convex loss functions. 3. **Theoretical Analysis**: By introducing the SIISOW and INV conditions, transform the game modification problem into an optimization problem with linear and spectral constraints and fully characterize its feasibility. ### Mathematical Expression The game modification problem can be formalized as the following optimization problem: \[ \begin{aligned} & \inf_{R} \ell(R, R^*) \\ & \text{s.t. } (p, q) \text{ is the unique MPE of } (R, P^*) \\ & \text{value}(R, P^*) \in [v_{\min}, v_{\max}] \\ & \text{Elements of } R \text{ are within } [-b, b] \end{aligned} \] where: - \(R^*\) and \(P^*\) are the reward matrix and transition probability matrix of the original game respectively. - \((p, q)\) is the target policy configuration. - \(\ell(R, R^*)\) is a loss function that measures the difference between the new and old games, for example, \(\ell(R, R^*)=\|R - R^*\|\). Through the above optimization problem, the researchers have solved the problem of how to minimize the modification cost while achieving a specific Nash equilibrium and game value.

Minimally Modifying a Markov Game to Achieve Any Nash Equilibrium and Value

Robust optimal policies for team Markov games

Differentiable Arbitrating in Zero-sum Markov Games

Empirical Policy Optimization for n-Player Markov Games

Model and Reinforcement Learning for Markov Games with Risk Preferences

Learning Equilibria in Adversarial Team Markov Games: A Nonconvex-Hidden-Concave Min-Max Optimization Problem

Near-Optimal Last-iterate Convergence of Policy Optimization in Zero-sum Polymatrix Markov Games

Robust Reward Design for Markov Decision Processes

Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games

Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets

Minimizing Risk Models in Markov Decision Processes with Policies Depending on Target Values

Tractable Equilibrium Computation in Markov Games through Risk Aversion

Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games

Soft-Bellman Equilibrium in Affine Markov Games: Forward Solutions and Inverse Learning

Linear-quadratic zero-sum mean-field type games: Optimality conditions and policy optimization

A Sharp Analysis of Model-based Reinforcement Learning with Self-Play

Playing Against Fair Adversaries in Stochastic Games with Total Rewards

Convex-Concave Zero-sum Markov Stackelberg Games

Efficiently Computing Nash Equilibria in Adversarial Team Markov Games

Markov chain entropy games and the geometry of their Nash equilibria