Abstract:Recent works have provided algorithms by which decentralised agents, which may be connected via a communication network, can learn equilibria in Mean-Field Games from a single, non-episodic run of the empirical system. However, these algorithms are given for tabular settings: this computationally limits the size of players' observation space, meaning that the algorithms are not able to handle anything but small state spaces, nor to generalise beyond policies depending on the ego player's state to so-called 'population-dependent' policies. We address this limitation by introducing function approximation to the existing setting, drawing on the Munchausen Online Mirror Descent method that has previously been employed only in finite-horizon, episodic, centralised settings. While this permits us to include the population's mean-field distribution in the observation for each player's policy, it is arguably unrealistic to assume that decentralised agents would have access to this global information: we therefore additionally provide new algorithms that allow agents to estimate the global empirical distribution based on a local neighbourhood, and to improve this estimate via communication over a given network. Our experiments showcase how the communication network allows decentralised agents to estimate the mean-field distribution for population-dependent policies, and that exchanging policy information helps networked agents to outperform both independent and even centralised agents in function-approximation settings, by an even greater margin than in tabular settings.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is: in multi - agent reinforcement learning (MARL), how to effectively learn Nash equilibrium strategies in mean - field games (MFGs) in a large - scale agent environment. Specifically, the paper focuses on the following points: 1. **Handling large - scale state spaces**: Most existing algorithms are applicable to small - scale state spaces, that is, the situation where the Q - function can be represented in tabular form. However, this limits the applicability of these methods in practical applications because real - world problems often involve very large state spaces. The paper solves this problem by introducing function approximation, enabling the algorithm to handle larger state spaces. 2. **Policies depending on the mean - field distribution**: In many practical scenarios, agents need to make decisions not only based on their own states but also need to consider the state distribution of the entire group (i.e., the mean - field distribution). The paper proposes a method that enables agents to consider their own states and the mean - field distribution simultaneously in the policy, thereby achieving more complex "master policies" or "population - dependent policies". 3. **Decentralized learning and communication**: In practical applications, it is unrealistic to assume that all agents can access global information (such as the mean - field distribution). The paper proposes a method based on network communication, which enables decentralized agents to estimate the global mean - field distribution through local observations and communication with neighbors and improve their policies on this basis. This method not only improves the practicality of the algorithm but also avoids the computational bottleneck and single - point - of - failure problems brought by centralized learning. 4. **Improving learning efficiency**: Through network communication, agents can share policy parameters with better performance, thereby accelerating the learning process. The paper shows that with the help of network communication, decentralized agents can not only converge to Nash equilibrium faster than agents that learn independently but also, in some cases, even surpass the effect of centralized learning. In summary, by introducing function approximation and network communication mechanisms, this paper solves the problems of existing MFG algorithms in handling large - scale state spaces, implementing population - dependent policies, and improving the efficiency of decentralized learning, providing new ideas and technical support for the popularization of MFG in practical applications.

Networked Communication for Mean-Field Games with Function Approximation and Empirical Mean-Field Estimation

Networked Communication for Decentralised Agents in Mean-Field Games

Learning Graphon Mean Field Games and Approximate Nash Equilibria

Learning Correlated Equilibria in Mean-Field Games

A Single Online Agent Can Efficiently Learn Mean Field Games

Imitation Learning for Mean Field Games with Correlated Equilibria

Mean-Field Learning: a Survey

Reinforcement Learning for Mean Field Game

Monte Carlo Neural Fictitious Self-Play: Achieve Approximate Nash equilibrium of Imperfect-Information Games.

Distributed Nash Equilibrium Seeking over Time-Varying Directed Communication Networks

Learning distributed channel access policies for networked estimation: data-driven optimization in the mean-field regime

Game-Theoretic Distributed Empirical Risk Minimization With Strategic Network Design

Semantic Communication in Multi-team Dynamic Games: A Mean Field Perspective

Mean-field games among teams

General sum stochastic games with networked information flows

Efficient Distributed Learning in Stochastic Non-cooperative Games without Information Exchange.

From Nash Equilibrium to Social Optimum and vice versa: a Mean Field Perspective

Mean-field games of speedy information access with observation costs

Model-free Reinforcement Learning for Non-stationary Mean Field Games

Scalable Decentralized Algorithms for Online Personalized Mean Estimation

Scalable Offline Reinforcement Learning for Mean Field Games