Filippos Christianos,Georgios Papoudakis,Stefano V. Albrecht
Abstract:This work focuses on equilibrium selection in no-conflict multi-agent games, where we specifically study the problem of selecting a Pareto-optimal Nash equilibrium among several existing equilibria. It has been shown that many state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone to converging to Pareto-dominated equilibria due to the uncertainty each agent has about the policy of the other agents during training. To address sub-optimal equilibrium selection, we propose Pareto Actor-Critic (Pareto-AC), which is an actor-critic algorithm that utilises a simple property of no-conflict games (a superset of cooperative games): the Pareto-optimal equilibrium in a no-conflict game maximises the returns of all agents and, therefore, is the preferred outcome for all agents. We evaluate Pareto-AC in a diverse set of multi-agent games and show that it converges to higher episodic returns compared to seven state-of-the-art MARL algorithms and that it successfully converges to a Pareto-optimal equilibrium in a range of matrix games. Finally, we propose PACDCG, a graph neural network extension of Pareto-AC, which is shown to efficiently scale in games with a large number of agents.
What problem does this paper attempt to address?
This paper attempts to solve the problem of selecting Pareto - optimal Nash equilibria in multi - agent reinforcement learning (MARL), especially in conflict - free multi - agent games. Specifically, the paper focuses on how to select an equilibrium that is optimal for all agents among multiple possible Nash equilibria, rather than getting trapped in sub - optimal equilibria.
### Main Problems of the Paper
1. **Sub - optimal Equilibrium Selection**: Many existing multi - agent reinforcement learning algorithms are prone to converge to sub - optimal Nash equilibria, that is, Pareto - inferior equilibria. This is because during the training process, each agent's uncertainty about the strategies of other agents leads to this phenomenon.
2. **Risk - Averse Behavior**: In some games (such as the "Stag Hunt game"), agents may tend to choose risk - averse strategies, even if these strategies are not optimal. For example, in the "Stag Hunt game", an agent may choose a low - reward but safer option instead of a high - reward but risky option.
3. **Coordination Problems**: Existing methods perform poorly when dealing with tasks that require a high degree of coordination between agents, especially in partially observable stochastic games.
### Solutions
To solve the above problems, the authors propose the Pareto Actor - Critic (Pareto - AC) algorithm. This algorithm is implemented in the following ways:
- **Utilizing the Characteristics of Conflict - Free Games**: In conflict - free games, all agents hope to reach a Pareto - optimal equilibrium because this is the optimal result for all agents.
- **Modifying Advantage Estimation**: By using a modified advantage estimation method, Pareto - AC guides the policy gradient towards Pareto - optimal equilibria. Specifically, it assumes that the actions that each agent expects other agents to choose will lead to the same equilibrium.
- **Joint Action Computation**: To achieve this goal, Pareto - AC performs computations involving joint actions. However, as the number of agents increases, the complexity of these computations grows exponentially. For this reason, the authors also propose an extension based on graph neural networks - PACDCG (Pareto Actor - Critic with Deep Coordination Graphs) to alleviate the computational complexity problem.
### Experimental Results
The paper demonstrates the superior performance of Pareto - AC in a series of multi - agent games, including matrix games and partially observable stochastic games. Experiments show that Pareto - AC can successfully converge to Pareto - optimal equilibria and performs well in high - dimensional multi - agent tasks.
### Summary
The main contribution of this paper is the proposal of a new deep multi - agent reinforcement learning algorithm, Pareto - AC, which can effectively select Pareto - optimal Nash equilibria and avoid the problem that existing methods are prone to get trapped in sub - optimal solutions. In addition, by introducing PACDCG, the computational complexity problem is further solved, making the algorithm applicable in larger - scale multi - agent environments.