Abstract:In this paper, we consider a large class of constrained non-cooperative stochastic Markov games with countable state spaces and discounted cost criteria. In one-player case, i.e., constrained discounted Markov decision models, it is possible to formulate a static optimisation problem whose solution determines a stationary optimal strategy (alias control or policy) in the dynamical infinite horizon model. This solution lies in the compact convex set of all occupation measures induced by strategies, defined on the set of state-action pairs. In case of n-person discounted games the occupation measures are induced by strategies of all players. Therefore, it is difficult to generalise the approach for constrained discounted Markov decision processes directly. It is not clear how to define the domain for the best-response correspondence whose fixed point induces a stationary equilibrium in the Markov game. This domain should be the Cartesian product of compact convex sets in locally convex topological vector spaces. One of our main results shows how to overcome this difficulty and define a constrained non-cooperative static game whose Nash equilibrium induces by a stationary Nash equilibrium in the Markov game. This is done for games with bounded cost functions and positive initial state distribution. An extension to a class of Markov games with unbounded costs and arbitrary initial state distribution relies on approximation of the unbounded game by bounded ones with positive initial state distributions. In the unbounded case, we assume the uniform integrability of the discounted costs with respect to all probability measures induced by strategies of the players, defined on the space of plays (histories) of the game. Our assumptions are weaker than those applied in earlier works on discounted dynamic programming or stochastic games using so-called weighted norm approaches.

Stochastic Dynamic Programming with Non-linear Discounting

Stochastic dynamic programming under recursive Epstein-Zin preferences

On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes

Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Running Time

An approximation approach to dynamic programming with unbounded returns

Gradient-Bounded Dynamic Programming with Submodular and Concave Extensible Value Functions

On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes

Constrained Markov Decision Processes with Non-constant Discount Factor

Constrained discounted stochastic games

Stochastic Knapsack Problem Revisited: Switch-Over Policies and Dynamic Pricing

Complexity of stochastic dual dynamic programming

Utility Maximization with Habit Formation: Dynamic Programming and Stochastic PDEs

Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods

Markov Decision Processes with Time-Varying Geometric Discounting

Dual dynamic programming for stochastic programs over an infinite horizon

Optimistic Planning by Regularized Dynamic Programming

Stochastic control up to a hitting time: optimality and rolling-horizon implementation

Long-Run Impulse Control with Generalized Discounting

Variational Dynamic Programming for Stochastic Optimal Control

On the Performance Bounds of some Policy Search Dynamic Programming Algorithms

Long Run Stochastic Control Problems with General Discounting