Abstract:Mean Field Games (MFGs) have the ability to handle large-scale multi-agent systems, but learning Nash equilibria in MFGs remains a challenging task. In this paper, we propose a deep reinforcement learning (DRL) algorithm that achieves population-dependent Nash equilibrium without the need for averaging or sampling from history, inspired by Munchausen RL and Online Mirror Descent. Through the design of an additional inner-loop replay buffer, the agents can effectively learn to achieve Nash equilibrium from any distribution, mitigating catastrophic forgetting. The resulting policy can be applied to various initial distributions. Numerical experiments on four canonical examples demonstrate our algorithm has better convergence properties than SOTA algorithms, in particular a DRL version of Fictitious Play for population-dependent policies.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How to effectively learn Nash equilibrium strategies that depend on population distributions in Mean Field Games (MFGs). Specifically, existing methods face two main challenges when dealing with large - scale multi - agent systems: 1. **Computational complexity**: Many existing algorithms need to calculate the best - response strategy with respect to the current population distribution in each iteration step, which is very time - consuming in complex problems. 2. **Learning rate decay**: Some methods based on Fictitious Play (FP) use uniform averaging of all past population distributions when updating the population distribution, resulting in the learning rate gradually slowing down as the number of iterations increases. To solve these problems, the authors propose a new algorithm based on Deep Reinforcement Learning (DRL) and Online Mirror Descent (OMD). This algorithm can learn a "master policy", which can start from any initial distribution, enabling all players to always make decisions according to Nash equilibrium without having to relearn new strategies. By introducing an additional inner - loop replay buffer, this algorithm can effectively alleviate catastrophic forgetting, thus better adapting to different initial distributions. ### Specific problems and solutions - **Problem**: Existing methods are difficult to handle strategy learning that depends on population distributions and perform poorly when faced with multiple initial distributions. - **Solutions**: - A new DRL algorithm is proposed, which combines the ideas of Munchausen RL and OMD. - A special replay buffer design is used to ensure that the algorithm can learn Nash equilibrium strategies from any initial distribution. - The superior performance of this algorithm on four classic examples is verified through numerical experiments, especially in terms of convergence speed and stability, which are better than existing methods. ### Mathematical formula representation To understand the core idea of the algorithm more clearly, the following are the key formulas: - **Q - function update**: \[ T_n = r_n^k + L_n^k + \gamma \sum_{a_{n + 1}} \pi_k^{\theta'}(a_{n + 1} | s_n^{k + 1}) \left[ \tilde{Q}_k^{\theta'}(s_{n + 1}^k, a_{n + 1}) - L_{n + 1}^k \right] \] where \( r_n^k = r(x_n, a_n, \mu_n^k) \), \( s_n^k = (n, x_n, \mu_n^k) \), \( L_n^k = \tau \log \pi_{k - 1}^\theta(a_n | s_n^k) \). - **Strategy update**: \[ \pi_k(\cdot | n, x, \mu) = \text{softmax}\left( \frac{1}{\tau} \tilde{Q}_\theta(n, x, \mu, \cdot) \right) \] These formulas show how to minimize the loss function through deep network training and update the strategy to approximate Nash equilibrium. In conclusion, this paper aims to solve the problem of learning Nash equilibrium strategies that depend on population distributions in MFGs, and proposes a new DRL algorithm. Through the improved OMD and replay buffer design, better convergence and adaptability are achieved.

Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning

Scalable Offline Reinforcement Learning for Mean Field Games

Model-Free Reinforcement Learning for Mean Field Games

A Single Online Agent Can Efficiently Learn Mean Field Games

Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective

Deep Reinforcement Learning for Infinite Horizon Mean Field Problems in Continuous Spaces

Model-free Reinforcement Learning for Non-stationary Mean Field Games

Agent-Level Maximum Entropy Inverse Reinforcement Learning for Mean Field Games

MF-OML: Online Mean-Field Reinforcement Learning with Occupation Measures for Large Population Games

Provable Fictitious Play for General Mean-Field Games

Reinforcement Learning for Finite Space Mean-Field Type Games

Reinforcement Learning for Mean Field Game

Learning Deep Mean Field Games for Modeling Large Population Behavior

Reinforcement Learning for Non-stationary Discrete-Time Linear–Quadratic Mean-Field Games in Multiple Populations

Learning in Mean Field Games: A Survey

Deep Reinforcement Learning for Nash Equilibrium of Differential Games

Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation

Oracle-free Reinforcement Learning in Mean-Field Games along a Single Sample Path

A General Framework for Learning Mean-Field Games

Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL

Robust Cooperative Multi-Agent Reinforcement Learning:A Mean-Field Type Game Perspective