Abstract:Many economic games and machine learning approaches can be cast as competitive optimization problems where multiple agents are minimizing their respective objective function, which depends on all agents' actions. While gradient descent is a reliable basic workhorse for single-agent optimization, it often leads to oscillation in competitive optimization. In this work we propose polymatrix competitive gradient descent (PCGD) as a method for solving general sum competitive optimization involving arbitrary numbers of agents. The updates of our method are obtained as the Nash equilibria of a local polymatrix approximation with a quadratic regularization, and can be computed efficiently by solving a linear system of equations. We prove local convergence of PCGD to stable fixed points for $n$-player general-sum games, and show that it does not require adapting the step size to the strength of the player-interactions. We use PCGD to optimize policies in multi-agent reinforcement learning and demonstrate its advantages in Snake, Markov soccer and an electricity market game. Agents trained by PCGD outperform agents trained with simultaneous gradient descent, symplectic gradient adjustment, and extragradient in Snake and Markov soccer games and on the electricity market game, PCGD trains faster than both simultaneous gradient descent and the extragradient method.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively perform competitive optimization in a multi - agent environment. Specifically: 1. **Competitive Optimization Problem in Multi - Agent Environment**: In many economic games and machine - learning methods, the interactions of multiple agents can be modeled as a competitive optimization problem, where each agent tries to minimize its own objective function, and these objective functions depend on the behaviors of all agents. Traditional gradient - descent methods (such as SimGD) often lead to oscillations in this case and cannot converge to a stable solution. 2. **Limitations of Existing Methods**: Existing methods such as Simultaneous Gradient Descent (SimGD), Symplectic Gradient Adjustment (SGA), and Extragradient require adjusting the step size when dealing with strong interactions, which limits their convergence speed and stability. 3. **Applications in Multi - Agent Reinforcement Learning**: In multi - agent reinforcement learning, designing efficient optimization methods and ensuring their convergence and stability is a fundamental challenge. Traditional optimization methods may not be able to capture the interactions between agents well, resulting in poor performance. To address these problems, the paper proposes a new method - **Polymatrix Competitive Gradient Descent (PCGD)**. PCGD solves the above problems by introducing a local multilinear matrix approximation and using linear - algebra methods to efficiently solve the Nash equilibrium. Specific contributions include: - **Theoretical Contributions**: It is proved that PCGD locally converges to a stable fixed point in general - sum games of multiple agents and does not need to adjust the step size according to the interaction intensity between agents. - **Experimental Verification**: In the four - player Snake game, Markov Soccer game, and power - market simulation, it is shown that agents trained by PCGD are significantly superior to those trained by SimGD, SGA, and Extragradient. In conclusion, this paper aims to provide a more effective multi - agent competitive optimization method to overcome the limitations of existing methods in dealing with strong interactions and demonstrate its superior performance in practical applications.

Polymatrix Competitive Gradient Descent

Competitive Policy Optimization

On Gradient-Based Learning in Continuous Games

A Collaborative Multiagent Reinforcement Learning Method Based on Policy Gradient Potential

Efficient Competitive Self-Play Policy Optimization

Convex-Concave Zero-sum Markov Stackelberg Games

A Policy-Gradient Approach to Solving Imperfect-Information Games with Iterate Convergence

Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property

Near-Optimal Last-iterate Convergence of Policy Optimization in Zero-sum Polymatrix Markov Games

Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback

Exploiting hidden structures in non-convex games for convergence to Nash equilibrium

Geometric Convergence of Gradient Play Algorithms for Distributed Nash Equilibrium Seeking

Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective

Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization.

Universal Gradient Descent Ascent Method for Nonconvex-Nonconcave Minimax Optimization

Optimistic Multi-Agent Policy Gradient

Independent Policy Mirror Descent for Markov Potential Games: Scaling to Large Number of Players

A Policy Iteration Algorithm for N-player General-Sum Linear Quadratic Dynamic Games

A Payoff-Based Policy Gradient Method in Stochastic Games with Long-Run Average Payoffs

Provably Fast Convergence of Independent Natural Policy Gradient for Markov Potential Games

A Policy Gradient Algorithm to Alleviate the Multi-Agent Value Overestimation Problem in Complex Environments