Independent Learning in Constrained Markov Potential Games

Philip Jordan,Anas Barakat,Niao He
2024-02-28
Abstract:Constrained Markov games offer a formal mathematical framework for modeling multi-agent reinforcement learning problems where the behavior of the agents is subject to constraints. In this work, we focus on the recently introduced class of constrained Markov Potential Games. While centralized algorithms have been proposed for solving such constrained games, the design of converging independent learning algorithms tailored for the constrained setting remains an open question. We propose an independent policy gradient algorithm for learning approximate constrained Nash equilibria: Each agent observes their own actions and rewards, along with a shared state. Inspired by the optimization literature, our algorithm performs proximal-point-like updates augmented with a regularized constraint set. Each proximal step is solved inexactly using a stochastic switching gradient algorithm. Notably, our algorithm can be implemented independently without a centralized coordination mechanism requiring turn-based agent updates. Under some technical constraint qualification conditions, we establish convergence guarantees towards constrained approximate Nash equilibria. We perform simulations to illustrate our results.
Machine Learning,Computer Science and Game Theory,Multiagent Systems
What problem does this paper attempt to address?
This paper discusses the problem of designing independent learning algorithms in Constrained Markov Potential Games (CMPGs). Markov Potential Games (MPGs) are a mathematical framework for multi-agent reinforcement learning (MARL), especially suitable for handling problems where there are cooperation and competition among multiple agents, with agents' actions being constrained. CMPGs are an extension of MPGs that introduce constraint conditions. The paper mentions that although there are centralized algorithms that can solve such constrained games, designing an independent learning algorithm that converges without coordination mechanisms remains an open problem. The authors propose an agent-based policy gradient algorithm, where each agent can only observe its own actions, rewards, and shared states, and then perform steps approximating proximal-point updates while using a regularized constraint set. This algorithm uses a randomly switching gradient algorithm to approximately solve the proximal step for each agent, without requiring coordinated updates among the agents. The paper also proves that under certain technical conditions, the algorithm can converge to an approximate Nash equilibrium and provides a sample complexity analysis. The algorithm's performance is demonstrated through simulations. The study points out that this independent learning protocol has the advantages of scalability, privacy protection, and low communication cost in multi-agent environments. In summary, the paper attempts to address the problem of designing an algorithm that can independently learn and ensure global convergence in Constrained Markov Potential Games.