Abstract:We introduce anytime constraints to the multi-agent setting with the corresponding solution concept being anytime-constrained equilibrium (ACE). Then, we present a comprehensive theory of anytime-constrained Markov games, which includes (1) a computational characterization of feasible policies, (2) a fixed-parameter tractable algorithm for computing ACE, and (3) a polynomial-time algorithm for approximately computing feasible ACE. Since computing a feasible policy is NP-hard even for two-player zero-sum games, our approximation guarantees are the best possible under worst-case analysis. We also develop the first theory of efficient computation for action-constrained Markov games, which may be of independent interest.

What problem does this paper attempt to address?

This paper attempts to solve the problem of anytime constraints in multi - agent reinforcement learning (MARL). Specifically, the author introduced a new concept - anytime - constrained equilibrium (ACE), and conducted a comprehensive theoretical study of it in the multi - agent environment. ### Main problems of the paper The core question of the paper is: **In which categories of constrained multi - agent Markov games (cMGs) can the (approximate) anytime - constrained equilibrium (ACE) be computed in polynomial time?** ### Detailed explanation 1. **Background and motivation**: - In real - world applications, constraint conditions are very important, but the existing multi - agent reinforcement learning (MARL) literature lags far behind the single - agent setting in this regard. - Anytime constraints are especially applicable to scenarios such as autonomous vehicles, which need to balance safety and other goals (such as efficiency). - Although there has been some research on anytime constraints in the single - agent setting, there has not been a systematic study in the multi - agent environment yet. 2. **Main contributions**: - **Computational characterization**: The paper proposed a computational characterization method for feasible strategies. - **Fixed - parameter tractable algorithm**: Designed a fixed - parameter tractable (FPT) algorithm to compute sub - game - perfect ACE. - **Polynomial - time approximation algorithm**: Provided a polynomial - time algorithm to approximately compute feasible sub - game - perfect ACE. - **Efficient algorithm for action - constrained games**: Developed a computational theory for efficient action - constrained Markov games (action - constrained MGs), which may be of independent interest. 3. **Technical difficulties**: - Even for simple two - person zero - sum games, computing feasible strategies is an NP - hard problem. - Expected - constraint strategies may seriously violate anytime constraints, so the standard expected - constraint method will fail in this case. - Distributed learning and self - play methods usually depend on the choices of other players, making it difficult to guarantee feasibility. 4. **Solutions**: - By transforming the problem into an action - constrained Markov game and using linear programming (LP) to solve a series of stage games, this problem is effectively solved. - For the approximation algorithm, use the methods of cost truncation and rounding to derive an approximate game, whose solution is an approximately feasible equilibrium of the original cMG. ### Summary This paper aims to fill the gap in the research of anytime constraints in multi - agent reinforcement learning, and proposes a complete theoretical framework and algorithmic tools to compute or approximately compute the anytime - constrained equilibrium (ACE) in polynomial time. This provides new ideas and methods for solving multi - agent decision - making problems in complex real - world situations.

Anytime-Constrained Multi-Agent Reinforcement Learning

Multiagent Reinforcement Learning for Strictly Constrained Tasks Based on Reward Recorder

Anytime-Constrained Reinforcement Learning

Anytime-Competitive Reinforcement Learning with Policy Prior

Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning

Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium

Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation

Capacity-Limited Decentralized Actor-Critic for Multi-Agent Games

Multi-Agent Constrained Policy Optimisation

Sample-Efficient Multi-Agent RL: an Optimization Perspective.

Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game

Tractable Equilibrium Computation in Markov Games through Risk Aversion

A Risk-Averse Equilibrium for Multi-Agent Systems

Robust Cooperative Multi-Agent Reinforcement Learning:A Mean-Field Type Game Perspective

ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

Robust Multi-Agent Control via Maximum Entropy Heterogeneous-Agent Reinforcement Learning

Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers

Multiagent Reinforcement Learning with Unshared Value Functions.

Reinforcement Learning for Multi-Objective and Constrained Markov Decision Processes

Independent Learning in Constrained Markov Potential Games

Discrete-Time Mean Field Control with Environment States