Anytime-Constrained Multi-Agent Reinforcement Learning

Jeremy McMahan,Xiaojin Zhu
2024-10-31
Abstract:We introduce anytime constraints to the multi-agent setting with the corresponding solution concept being anytime-constrained equilibrium (ACE). Then, we present a comprehensive theory of anytime-constrained Markov games, which includes (1) a computational characterization of feasible policies, (2) a fixed-parameter tractable algorithm for computing ACE, and (3) a polynomial-time algorithm for approximately computing feasible ACE. Since computing a feasible policy is NP-hard even for two-player zero-sum games, our approximation guarantees are the best possible under worst-case analysis. We also develop the first theory of efficient computation for action-constrained Markov games, which may be of independent interest.
Machine Learning,Artificial Intelligence,Data Structures and Algorithms,Computer Science and Game Theory
What problem does this paper attempt to address?
This paper attempts to solve the problem of anytime constraints in multi - agent reinforcement learning (MARL). Specifically, the author introduced a new concept - anytime - constrained equilibrium (ACE), and conducted a comprehensive theoretical study of it in the multi - agent environment. ### Main problems of the paper The core question of the paper is: **In which categories of constrained multi - agent Markov games (cMGs) can the (approximate) anytime - constrained equilibrium (ACE) be computed in polynomial time?** ### Detailed explanation 1. **Background and motivation**: - In real - world applications, constraint conditions are very important, but the existing multi - agent reinforcement learning (MARL) literature lags far behind the single - agent setting in this regard. - Anytime constraints are especially applicable to scenarios such as autonomous vehicles, which need to balance safety and other goals (such as efficiency). - Although there has been some research on anytime constraints in the single - agent setting, there has not been a systematic study in the multi - agent environment yet. 2. **Main contributions**: - **Computational characterization**: The paper proposed a computational characterization method for feasible strategies. - **Fixed - parameter tractable algorithm**: Designed a fixed - parameter tractable (FPT) algorithm to compute sub - game - perfect ACE. - **Polynomial - time approximation algorithm**: Provided a polynomial - time algorithm to approximately compute feasible sub - game - perfect ACE. - **Efficient algorithm for action - constrained games**: Developed a computational theory for efficient action - constrained Markov games (action - constrained MGs), which may be of independent interest. 3. **Technical difficulties**: - Even for simple two - person zero - sum games, computing feasible strategies is an NP - hard problem. - Expected - constraint strategies may seriously violate anytime constraints, so the standard expected - constraint method will fail in this case. - Distributed learning and self - play methods usually depend on the choices of other players, making it difficult to guarantee feasibility. 4. **Solutions**: - By transforming the problem into an action - constrained Markov game and using linear programming (LP) to solve a series of stage games, this problem is effectively solved. - For the approximation algorithm, use the methods of cost truncation and rounding to derive an approximate game, whose solution is an approximately feasible equilibrium of the original cMG. ### Summary This paper aims to fill the gap in the research of anytime constraints in multi - agent reinforcement learning, and proposes a complete theoretical framework and algorithmic tools to compute or approximately compute the anytime - constrained equilibrium (ACE) in polynomial time. This provides new ideas and methods for solving multi - agent decision - making problems in complex real - world situations.