Abstract:This work focuses on equilibrium selection in no-conflict multi-agent games, where we specifically study the problem of selecting a Pareto-optimal Nash equilibrium among several existing equilibria. It has been shown that many state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone to converging to Pareto-dominated equilibria due to the uncertainty each agent has about the policy of the other agents during training. To address sub-optimal equilibrium selection, we propose Pareto Actor-Critic (Pareto-AC), which is an actor-critic algorithm that utilises a simple property of no-conflict games (a superset of cooperative games): the Pareto-optimal equilibrium in a no-conflict game maximises the returns of all agents and, therefore, is the preferred outcome for all agents. We evaluate Pareto-AC in a diverse set of multi-agent games and show that it converges to higher episodic returns compared to seven state-of-the-art MARL algorithms and that it successfully converges to a Pareto-optimal equilibrium in a range of matrix games. Finally, we propose PACDCG, a graph neural network extension of Pareto-AC, which is shown to efficiently scale in games with a large number of agents.

What problem does this paper attempt to address?

This paper attempts to solve the problem of selecting Pareto - optimal Nash equilibria in multi - agent reinforcement learning (MARL), especially in conflict - free multi - agent games. Specifically, the paper focuses on how to select an equilibrium that is optimal for all agents among multiple possible Nash equilibria, rather than getting trapped in sub - optimal equilibria. ### Main Problems of the Paper 1. **Sub - optimal Equilibrium Selection**: Many existing multi - agent reinforcement learning algorithms are prone to converge to sub - optimal Nash equilibria, that is, Pareto - inferior equilibria. This is because during the training process, each agent's uncertainty about the strategies of other agents leads to this phenomenon. 2. **Risk - Averse Behavior**: In some games (such as the "Stag Hunt game"), agents may tend to choose risk - averse strategies, even if these strategies are not optimal. For example, in the "Stag Hunt game", an agent may choose a low - reward but safer option instead of a high - reward but risky option. 3. **Coordination Problems**: Existing methods perform poorly when dealing with tasks that require a high degree of coordination between agents, especially in partially observable stochastic games. ### Solutions To solve the above problems, the authors propose the Pareto Actor - Critic (Pareto - AC) algorithm. This algorithm is implemented in the following ways: - **Utilizing the Characteristics of Conflict - Free Games**: In conflict - free games, all agents hope to reach a Pareto - optimal equilibrium because this is the optimal result for all agents. - **Modifying Advantage Estimation**: By using a modified advantage estimation method, Pareto - AC guides the policy gradient towards Pareto - optimal equilibria. Specifically, it assumes that the actions that each agent expects other agents to choose will lead to the same equilibrium. - **Joint Action Computation**: To achieve this goal, Pareto - AC performs computations involving joint actions. However, as the number of agents increases, the complexity of these computations grows exponentially. For this reason, the authors also propose an extension based on graph neural networks - PACDCG (Pareto Actor - Critic with Deep Coordination Graphs) to alleviate the computational complexity problem. ### Experimental Results The paper demonstrates the superior performance of Pareto - AC in a series of multi - agent games, including matrix games and partially observable stochastic games. Experiments show that Pareto - AC can successfully converge to Pareto - optimal equilibria and performs well in high - dimensional multi - agent tasks. ### Summary The main contribution of this paper is the proposal of a new deep multi - agent reinforcement learning algorithm, Pareto - AC, which can effectively select Pareto - optimal Nash equilibria and avoid the problem that existing methods are prone to get trapped in sub - optimal solutions. In addition, by introducing PACDCG, the computational complexity problem is further solved, making the algorithm applicable in larger - scale multi - agent environments.

Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning

Bi-Level Actor-Critic for Multi-Agent Coordination.

Equilibrium Selection for Multi-agent Reinforcement Learning: A Unified Framework

Cooperative multi-agent game based on reinforcement learning

Decentralized multi-agent reinforcement learning based on best-response policies

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Capacity-Limited Decentralized Actor-Critic for Multi-Agent Games

Decentralized Multi-Agent Reinforcement Learning: An Off-Policy Method

Policy Evaluation and Seeking for Multi-Agent Reinforcement Learning Via Best Response

F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning

Risk-Sensitive Multi-Agent Reinforcement Learning in Network Aggregative Markov Games

Efficient Multi-Agent Exploration with Mutual-Guided Actor-Critic

Evolving Pareto-Optimal Actor-Critic Algorithms for Generalizability and Stability

Social Optimum Equilibrium Selection for Distributed Multi-Agent Optimization

Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

Toward Finding Strong Pareto Optimal Policies in Multi-Agent Reinforcement Learning

Local Advantage Networks for Cooperative Multi-Agent Reinforcement Learning

Multi-agent Natural Actor-critic Reinforcement Learning Algorithms

A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems

Finite-Time Convergence and Sample Complexity of Actor-Critic Multi-Objective Reinforcement Learning