Independent Learning in Constrained Markov Potential Games

Philip Jordan,Anas Barakat,Niao He

2024-02-28

Abstract:Constrained Markov games offer a formal mathematical framework for modeling multi-agent reinforcement learning problems where the behavior of the agents is subject to constraints. In this work, we focus on the recently introduced class of constrained Markov Potential Games. While centralized algorithms have been proposed for solving such constrained games, the design of converging independent learning algorithms tailored for the constrained setting remains an open question. We propose an independent policy gradient algorithm for learning approximate constrained Nash equilibria: Each agent observes their own actions and rewards, along with a shared state. Inspired by the optimization literature, our algorithm performs proximal-point-like updates augmented with a regularized constraint set. Each proximal step is solved inexactly using a stochastic switching gradient algorithm. Notably, our algorithm can be implemented independently without a centralized coordination mechanism requiring turn-based agent updates. Under some technical constraint qualification conditions, we establish convergence guarantees towards constrained approximate Nash equilibria. We perform simulations to illustrate our results.

Machine Learning,Computer Science and Game Theory,Multiagent Systems

What problem does this paper attempt to address?

This paper discusses the problem of designing independent learning algorithms in Constrained Markov Potential Games (CMPGs). Markov Potential Games (MPGs) are a mathematical framework for multi-agent reinforcement learning (MARL), especially suitable for handling problems where there are cooperation and competition among multiple agents, with agents' actions being constrained. CMPGs are an extension of MPGs that introduce constraint conditions. The paper mentions that although there are centralized algorithms that can solve such constrained games, designing an independent learning algorithm that converges without coordination mechanisms remains an open problem. The authors propose an agent-based policy gradient algorithm, where each agent can only observe its own actions, rewards, and shared states, and then perform steps approximating proximal-point updates while using a regularized constraint set. This algorithm uses a randomly switching gradient algorithm to approximately solve the proximal step for each agent, without requiring coordinated updates among the agents. The paper also proves that under certain technical conditions, the algorithm can converge to an approximate Nash equilibrium and provides a sample complexity analysis. The algorithm's performance is demonstrated through simulations. The study points out that this independent learning protocol has the advantages of scalability, privacy protection, and low communication cost in multi-agent environments. In summary, the paper attempts to address the problem of designing an algorithm that can independently learn and ensure global convergence in Constrained Markov Potential Games.

Independent Learning in Constrained Markov Potential Games

Independent and Decentralized Learning in Markov Potential Games

Convergence Rate of Primal-Dual Approach to Constrained Reinforcement Learning with Softmax Policy

Provably Fast Convergence of Independent Natural Policy Gradient for Markov Potential Games

Scalable and Independent Learning of Nash Equilibrium Policies in $n$-Player Stochastic Games with Unknown Independent Chains

Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games

Almost Sure Convergence of Networked Policy Gradient over Time-Varying Networks in Markov Potential Games

Independent Policy Mirror Descent for Markov Potential Games: Scaling to Large Number of Players

Convergence of Decentralized Actor-Critic Algorithm in General-sum Markov Games

Independent Learning in Stochastic Games

Decentralized Policy Gradient for Nash Equilibria Learning of General-sum Stochastic Games

Reinforcement Learning for Multi-Objective and Constrained Markov Decision Processes

Provable Policy Gradient Methods for Average-Reward Markov Potential Games

A Scalable Game Theoretic Approach for Coordination of Multiple Dynamic Systems

Empirical Policy Optimization for <i>n</i>-Player Markov Games

A Randomized Inexact Proximal Best-Response Scheme for Potential Stochastic Nash Games.

Learning to Control Unknown Strongly Monotone Games

Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective

Learning Stationary Nash Equilibrium Policies in [math]-Player Stochastic Games with Independent Chains

Approximate Nash Equilibrium Learning for n-Player Markov Games in Dynamic Pricing

Empirical Policy Optimization for n-Player Markov Games