Abstract:Multi-agent reinforcement learning (MARL) has achieved notable success in cooperative tasks, demonstrating impressive performance and scalability. However, deploying MARL agents in real-world applications presents critical safety challenges. Current safe MARL algorithms are largely based on the constrained Markov decision process (CMDP) framework, which enforces constraints only on discounted cumulative costs and lacks an all-time safety assurance. Moreover, these methods often overlook the feasibility issue (the system will inevitably violate state constraints within certain regions of the constraint set), resulting in either suboptimal performance or increased constraint violations. To address these challenges, we propose a novel theoretical framework for safe MARL with $\textit{state-wise}$ constraints, where safety requirements are enforced at every state the agents visit. To resolve the feasibility issue, we leverage a control-theoretic notion of the feasible region, the controlled invariant set (CIS), characterized by the safety value function. We develop a multi-agent method for identifying CISs, ensuring convergence to a Nash equilibrium on the safety value function. By incorporating CIS identification into the learning process, we introduce a multi-agent dual policy iteration algorithm that guarantees convergence to a generalized Nash equilibrium in state-wise constrained cooperative Markov games, achieving an optimal balance between feasibility and performance. Furthermore, for practical deployment in complex high-dimensional systems, we propose $\textit{Multi-Agent Dual Actor-Critic}$ (MADAC), a safe MARL algorithm that approximates the proposed iteration scheme within the deep RL paradigm. Empirical evaluations on safe MARL benchmarks demonstrate that MADAC consistently outperforms existing methods, delivering much higher rewards while reducing constraint violations.

MatrixWorld: A pursuit-evasion platform for safe multi-agent coordination and autocurricula

Learning to Cooperate: Application of Deep Reinforcement Learning for Online AGV Path Finding.

MARNet: Backdoor Attacks Against Cooperative Multi-Agent Reinforcement Learning

Large Scale Pursuit-Evasion under Collision Avoidance Using Deep Reinforcement Learning.

Safe Multiagent Learning with Soft Constrained Policy Optimization in Real Robot Control

An Improved Approach Towards Multi-Agent Pursuit–Evasion Game Decision-Making Using Deep Reinforcement Learning

Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium

Pursuit-Evasion Games for Multi-agent Based on Reinforcement Learning with Obstacles

Safe Multi-Agent Reinforcement Learning for Multi-Robot Control

A Graph-Based Soft Actor Critic Approach in Multi-Agent Reinforcement Learning

Multi-Agent Constrained Policy Optimisation

Safe Multi-Agent Reinforcement Learning through Decentralized Multiple Control Barrier Functions

Learning Adaptive Safety for Multi-Agent Systems

Multi-Agent Cooperative-Competitive Environment with Reinforcement Learning

Multi-Agent Reinforcement Learning with Control-Theoretic Safety Guarantees for Dynamic Network Bridging

ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

Diffusion Models for Offline Multi-agent Reinforcement Learning with Safety Constraints

Co-Evolving Multi-Agent Transfer Reinforcement Learning Via Scenario Independent Representation

NeuronsMAE: A Novel Multi-Agent Reinforcement Learning Environment for Cooperative and Competitive Multi-Robot Tasks

A Pursuit-Evasion Game on a Real-City Virtual Simulation Platform Based on Multi-Agent Reinforcement Learning

A Survey of Progress on Cooperative Multi-agent Reinforcement Learning in Open Environment