Abstract:Reinforcement learning algorithms for mean-field games offer a scalable framework for optimizing policies in large populations of interacting agents. Existing methods often depend on online interactions or access to system dynamics, limiting their practicality in real-world scenarios where such interactions are infeasible or difficult to model. In this paper, we present Offline Munchausen Mirror Descent (Off-MMD), a novel mean-field RL algorithm that approximates equilibrium policies in mean-field games using purely offline data. By leveraging iterative mirror descent and importance sampling techniques, Off-MMD estimates the mean-field distribution from static datasets without relying on simulation or environment dynamics. Additionally, we incorporate techniques from offline reinforcement learning to address common issues like Q-value overestimation, ensuring robust policy learning even with limited data coverage. Our algorithm scales to complex environments and demonstrates strong performance on benchmark tasks like crowd exploration or navigation, highlighting its applicability to real-world multi-agent systems where online experimentation is infeasible. We empirically demonstrate the robustness of Off-MMD to low-quality datasets and conduct experiments to investigate its sensitivity to hyperparameter choices.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve two key problems encountered when applying reinforcement learning (RL) algorithms in large - scale multi - agent systems: 1. **Limitations of online interactions**: Most existing reinforcement learning algorithms for mean - field games (MFGs) rely on online interactions with the environment or access to system dynamics. However, in many real - world application scenarios, such as traffic routing, crowd dynamics, or recommendation systems, where a large number of agents or humans are involved, online interactions are either impractical or ethically infeasible. For example, in systems involving human agents, real - time data collection is often costly and invasive, and continuous exploration may lead to user dissatisfaction or safety risks. 2. **Scalability in complex environments**: Traditional multi - agent reinforcement learning (MARL) methods are difficult to scale to environments containing a large number of agents. As the number of agents increases, learning effective strategies may become computationally infeasible. Mean - field games provide a scalable approach to handle this complexity by modeling the interactions between individual agents and the population statistical representation (i.e., the mean - field). To solve these problems, the authors propose a new algorithm called **Offline Munchausen Mirror Descent (Off - MMD)**. Off - MMD is an offline reinforcement learning algorithm for mean - field games that can approximate equilibrium strategies using only static datasets. Specifically, the main contributions of this algorithm include: - **Estimating the mean - field distribution using offline data**: Through iterative mirror descent and importance sampling techniques, Off - MMD can estimate the mean - field distribution from a static dataset without relying on simulation or environmental dynamics. - **Addressing the Q - value overestimation problem**: Techniques from offline reinforcement learning, such as Conservative Q - Learning, are introduced to ensure robust policy learning even in cases of limited data coverage. - **Applicability to complex environments**: This algorithm can be extended to complex environments and performs well in benchmark tasks (such as crowd exploration or navigation), highlighting its applicability in real - world multi - agent systems. In summary, this paper attempts to overcome the limitations of existing mean - field - game - based reinforcement learning methods in practical applications, especially those scenarios where online interactions are not possible, by developing a new offline reinforcement learning algorithm.

Scalable Offline Reinforcement Learning for Mean Field Games

Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning

Reinforcement Learning for Mean Field Game

A Single Online Agent Can Efficiently Learn Mean Field Games

Model-Free Reinforcement Learning for Mean Field Games

Model-free Reinforcement Learning for Non-stationary Mean Field Games

Reinforcement Learning for Non-stationary Discrete-Time Linear–Quadratic Mean-Field Games in Multiple Populations

Oracle-free Reinforcement Learning in Mean-Field Games along a Single Sample Path

Offline Fictitious Self-Play for Competitive Games

Offline Equilibrium Finding

MF-OML: Online Mean-Field Reinforcement Learning with Occupation Measures for Large Population Games

Agent-Level Maximum Entropy Inverse Reinforcement Learning for Mean Field Games

Reinforcement Learning for Finite Space Mean-Field Type Games

Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game

Towards Data-Driven Offline Simulations for Online Reinforcement Learning

Deep Reinforcement Learning for Infinite Horizon Mean Field Problems in Continuous Spaces

Robust Cooperative Multi-Agent Reinforcement Learning:A Mean-Field Type Game Perspective

Offline Decentralized Multi-Agent Reinforcement Learning

Offline Primal-Dual Reinforcement Learning for Linear MDPs

Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data

Efficient Online Reinforcement Learning with Offline Data