Abstract:We address in this paper Reinforcement Learning (RL) among agents that are grouped into teams such that there is cooperation within each team but general-sum (non-zero sum) competition across different teams. To develop an RL method that provably achieves a Nash equilibrium, we focus on a linear-quadratic structure. Moreover, to tackle the non-stationarity induced by multi-agent interactions in the finite population setting, we consider the case where the number of agents within each team is infinite, i.e., the mean-field setting. This results in a General-Sum LQ Mean-Field Type Game (GS-MFTGs). We characterize the Nash equilibrium (NE) of the GS-MFTG, under a standard invertibility condition. This MFTG NE is then shown to be $\mathcal{O}(1/M)$-NE for the finite population game where $M$ is a lower bound on the number of agents in each team. These structural results motivate an algorithm called Multi-player Receding-horizon Natural Policy Gradient (MRPG), where each team minimizes its cumulative cost independently in a receding-horizon manner. Despite the non-convexity of the problem, we establish that the resulting algorithm converges to a global NE through a novel problem decomposition into sub-problems using backward recursive discrete-time Hamilton-Jacobi-Isaacs (HJI) equations, in which independent natural policy gradient is shown to exhibit linear convergence under time-independent diagonal dominance. Experiments illuminate the merits of this approach in practice.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to achieve Nash Equilibrium (NE) in a multi - agent system when agents are divided into multiple teams, with cooperation within each team but general - sum (non - zero - sum) competition between different teams. Specifically: 1. **Problem Background**: - Multi - Agent Reinforcement Learning (MARL) has become increasingly popular in dealing with sequential decision - making problems among agents. - In a purely cooperative environment, many algorithms and performance guarantees have been developed, but in environments where agent goals may be opposed (such as traffic congestion, financial markets, market negotiations, etc.), relatively little research has been done. - Finding Nash equilibrium strategies in general - sum stochastic games is usually an NP - hard problem. 2. **Research Objectives**: - The authors studied the Cooperative - Competitive (CC) team setting and attempted to understand the conditions for achieving Nash equilibrium in this setting. - Specifically, they hoped to find a data - driven method to achieve general - sum Nash equilibrium in CC games. 3. **Methodology**: - To make the problem solvable, the authors made two structural assumptions: - The dynamics of agents are linear and the cost is quadratic (i.e., the linear - quadratic, LQ setting). - The number of agents in each team tends to infinity, so that its Mean - Field (MF) limit approximation can be used. - This setting results in a General - Sum LQ Mean - Field Type Game (GS - MFTG). 4. **Main Contributions**: - The authors formalized the CC game in the finite - agent LQ framework and derived its mean - field approximation as MFTG. This approximation introduced an O(1/M) deviation, where M is the minimum number of agents in any team. - They developed a Multi - player Receding - horizon Natural Policy Gradient (MRPG) algorithm to learn the NE of GS - MFTG. - By decomposing simpler time - step sub - problems, the MRPG algorithm converges to the global NE at a linear rate under the time - independent diagonally dominant condition. In summary, this paper aims to solve the Nash equilibrium problem in a multi - agent system environment where cooperation and competition coexist, proposes a data - driven method based on mean - field theory, and proves the effectiveness and convergence of this method.

Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective

Robust Cooperative Multi-Agent Reinforcement Learning:A Mean-Field Type Game Perspective

Reinforcement Learning for Non-stationary Discrete-Time Linear–Quadratic Mean-Field Games in Multiple Populations

Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning

A Generalized Training Approach for Multiagent Learning

A Single Online Agent Can Efficiently Learn Mean Field Games

MF-OML: Online Mean-Field Reinforcement Learning with Occupation Measures for Large Population Games

Sample-Efficient Multi-Agent RL: an Optimization Perspective.

Model-Free Reinforcement Learning for Mean Field Games

Reinforcement Learning for Mean Field Game

Scalable and Independent Learning of Nash Equilibrium Policies in $n$-Player Stochastic Games with Unknown Independent Chains

Model-free Reinforcement Learning for Non-stationary Mean Field Games

A General Framework for Learning Mean-Field Games

Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL

Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property

Learning in Mean Field Games: A Survey

Provably Fast Convergence of Independent Natural Policy Gradient for Markov Potential Games

Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation

Maximum Causal Entropy Inverse Reinforcement Learning for Mean-Field Games

Bounded Rationality Equilibrium Learning in Mean Field Games

Empirical Policy Optimization for n-Player Markov Games