Abstract:Zero-sum Markov Games (MGs) has been an efficient framework for multi-agent systems and robust control, wherein a minimax problem is constructed to solve the equilibrium policies. At present, this formulation is well studied under tabular settings wherein the maximum operator is primarily and exactly solved to calculate the worst-case value function. However, it is non-trivial to extend such methods to handle complex tasks, as finding the maximum over large-scale action spaces is usually cumbersome. In this paper, we propose the smoothing policy iteration (SPI) algorithm to solve the zero-sum MGs approximately, where the maximum operator is replaced by the weighted LogSumExp (WLSE) function to obtain the nearly optimal equilibrium policies. Specially, the adversarial policy is served as the weight function to enable an efficient sampling over action spaces.We also prove the convergence of SPI and analyze its approximation error in ∞ -norm based on the contraction mapping theorem. Besides, we propose a model-based algorithm called Smooth adversarial Actor-critic (SaAC) by extending SPI with the function approximations. The target value related to WLSE function is evaluated by the sampled trajectories and then mean square error is constructed to optimize the value function, and the gradient-ascent-descent methods are adopted to optimize the protagonist and adversarial policies jointly. In addition, we incorporate the reparameterization technique in model-based gradient back-propagation to prevent the gradient vanishing due to sampling from the stochastic policies. We verify our algorithm in both tabular and function approximation settings. Results show that SPI can approximate the worst-case value function with a high accuracy and SaAC can stabilize the training process and improve the adversarial robustness in a large margin.

Learning Optimal Policies in Potential Mean Field Games: Smoothed Policy Iteration Algorithms

A Policy Iteration Method for Inverse Mean Field Games

Smoothing Policy Iteration for Zero-sum Markov Games

Learning in Herding Mean Field Games: Single-Loop Algorithm with Finite-Time Convergence Analysis

Deep Policy Iteration for High-Dimensional Mean Field Games

A Single Online Agent Can Efficiently Learn Mean Field Games

A Policy Iteration Algorithm for N-player General-Sum Linear Quadratic Dynamic Games

A General Framework for Learning Mean-Field Games

Regularization of the policy updates for stabilizing Mean Field Games

From Nash Equilibrium to Social Optimum and vice versa: a Mean Field Perspective

Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation

MF-OMO: An Optimization Formulation of Mean-Field Games

Linear-quadratic zero-sum mean-field type games: Optimality conditions and policy optimization

Analysis and Numerical Approximation of Stationary Second-Order Mean Field Game Partial Differential Inclusions

A Policy-Gradient Approach to Solving Imperfect-Information Games with Iterate Convergence

Relaxed Policy Iteration Algorithm for Nonlinear Zero-Sum Games with Application to H-infinity Control

Scalable Learning for Spatiotemporal Mean Field Games Using Physics-Informed Neural Operator

Mean-Field Learning: a Survey

Numerical methods for mean field games based on Gaussian processes and Fourier features

Fictitious Play via Finite Differences for Mean Field Games with Optimal Stopping

A fictitious-play finite-difference method for linearly solvable mean field games